Cory Grinstead
05/23/2024, 8:40 PM_platform_memmove
calls.
for the code snippet
let file = "Daft/lineitem.parquet";
let io_config = IOConfig::default();
let io_client = Arc::new(IOClient::new(io_config.into())?);
let runtime_handle = daft_io::get_runtime(true)?;
let table = read_parquet(
file,
None,
None,
None,
None,
None,
io_client,
None,
runtime_handle,
Default::default(),
)?;
jay
05/23/2024, 8:48 PMCory Grinstead
05/23/2024, 8:49 PMjay
05/23/2024, 9:00 PMread_parquet
call.
E.g. how much % does the Rust read_parquet()
call take up in a Daft daft.read_parquet("…").collect()
end-to-end call through the dataframe API?Cory Grinstead
05/23/2024, 9:03 PMjay
05/23/2024, 9:03 PMjay
05/23/2024, 9:03 PM--native
flag, but I think you need it to run in a Linux box for this to workjay
05/23/2024, 9:05 PMCory Grinstead
05/23/2024, 9:09 PMN
row groups would be processed as a single table.Cory Grinstead
05/23/2024, 9:35 PMCory Grinstead
05/23/2024, 9:46 PMjay
05/23/2024, 9:47 PMCory Grinstead
05/23/2024, 10:11 PMjay
05/23/2024, 10:12 PMdaft.read_parquet
working? 😮Cory Grinstead
05/23/2024, 10:14 PMmemray
but it doesn't capture the native symbols as well as cargo flamegraph
import daft
print(daft.read_parquet('../Daft/lineitem.parquet').collect())
Cory Grinstead
05/23/2024, 10:15 PMjay
05/23/2024, 10:16 PMjay
05/23/2024, 10:17 PMStart time: 2024-05-23 16:56:01.040000
End time: 2024-05-23 16:56:22.031000
Hmm ok so Daft took about 21 seconds to read the entire file into memoryCory Grinstead
05/23/2024, 10:18 PMCory Grinstead
05/23/2024, 10:54 PMCory Grinstead
05/29/2024, 3:13 PMconcat
is definitely the biggest performance killer. There are a few other smaller optimizations that come to mind, but I think if we could add support for "chunked" arrays, it'd greatly increase performance.
Unfortunately, I don't think I have enough familiarity with the core codebase yet to make these changes.