Colin Ho
05/29/2024, 11:21 PMflexible=true
flag on the async CSV reader: https://github.com/Eventual-Inc/Daft/pull/2326, ptal and lmk if you have any comments/feedback regarding the implementation.jay
05/29/2024, 11:54 PMflexible
or skip
.
Is the current flexible
behavior to append nulls for the remaining rows? What happens then if the schema doesn’t match for the first N rows that do make it in? 😬Colin Ho
05/30/2024, 12:22 AMflexible
is a little easier since the csv-async crate natively supports it https://docs.rs/csv-async/latest/csv_async/struct.AsyncReaderBuilder.html#method.flexible . Skip should be doable, it just needs a little more work.
And yes, flexible will append nulls if a given row doesn't have enough columns, and if the dtypes for a row doesnt match the schema it should also be nulled. for example if column 1 is supposed to be an int and the value for column 1 in a particular row is "a"
, that entry will be nulled. I'll definitely add tests for these cases though.Colin Ho
05/30/2024, 12:31 AMjay
05/30/2024, 5:46 PMColin Ho
05/30/2024, 5:49 PMColin Ho
05/30/2024, 9:42 PMColin Ho
06/03/2024, 6:22 PM