<@U042126MG49> draft pr to read a CSV with variabl...
# daft-dev
c
@jay draft pr to read a CSV with variable number of columns using a
flexible=true
flag on the async CSV reader: https://github.com/Eventual-Inc/Daft/pull/2326, ptal and lmk if you have any comments/feedback regarding the implementation.
🙌 1
🔥 1
j
Reviewed! Main question was if we wanted an option for
flexible
or
skip
. Is the current
flexible
behavior to append nulls for the remaining rows? What happens then if the schema doesn’t match for the first N rows that do make it in? 😬
c
flexible
is a little easier since the csv-async crate natively supports it https://docs.rs/csv-async/latest/csv_async/struct.AsyncReaderBuilder.html#method.flexible . Skip should be doable, it just needs a little more work. And yes, flexible will append nulls if a given row doesn't have enough columns, and if the dtypes for a row doesnt match the schema it should also be nulled. for example if column 1 is supposed to be an int and the value for column 1 in a particular row is
"a"
, that entry will be nulled. I'll definitely add tests for these cases though.
actually the logic for parsing columns from csv already caters for values that dont match the column's datatype in the schema: https://github.com/Eventual-Inc/Daft/blob/main/src/daft-csv/src/read.rs#L498-L508
j
LMK when ready for review!
c
working on it now, i got sidetracked with another issue
hey @Clark Zinzow! do you think you could you help review this PR since Jay is OOO this week?
👍 1