<#C041NA2RBFD|> is there any best way to write dat...
# general
s
#C041NA2RBFD is there any best way to write database and doing some upsert from daft dataframe?
j
Haha that’s a very general question and depends on what database you’re targeting!
s
I see, let's say if our database using postgres / sql server, maybe we're using aws rds
j
We currently support writing out to Parquet, so you can run an ingestion into those databases That being said, I wonder if it may be possible to leverage something like ADBC or JDBC. Upserts are even more interesting, because it involves not just an append but some kind of upsert API to specify criteria to update/delete rows
Do you have an API in mind?
s
Cause i want to replacing our etl stack (ssis), currently our main target is database. I think we can using adbc or jdbc or maybe odbc itself like pyodbc.
j
Do you require upserts? Or just appends
s
Both, depend on our change request. Are daft is in production grade?
I considering daft than polars, later we want to plan doing distribution process.
j
Yes, we’re more scalable and faster than polars but are a younger project so may be behind polars in some features (e.g. upsert APIs) and we primarily work with object stores (S3) Could you make an issue on the Daft repository? Would love to get your thoughts on suitable API!
s
Ya.. Maybe i can make an api to communicate with sql database help by chatgpt also haha. Think first. If i see daft documentation, daft can read sql from connectorx or sqlalchemy, i think to sql will do the same way ya
j
Yes we do read SQL actually in parallel 😎 Using the user-provided query we can shard it and perform a parallel read across the distributed cluster. We just need to make a good API to figure out the writing story here (appends and upserts). Every Python API I’ve seen for upserts has been very ugly haha.
s
I've benchmark dask / pandas / polars function to sql, they unable to support upsert process, just fail replace and append.. Is it right?
j
I believe so yes. Most databases would support appends through JDBC or ODBC I think.