avril
04/18/2024, 11:35 AMs3
The following Dask code works and returns the first 5 rows of the dataframe:
ddf = dd.read_parquet(
"<s3://dask-data/nyc-taxi/nyc-2015.parquet/part.*.parquet>",
storage_options={"anon": True},
)
ddf.head()
when I try to read the same files in with Daft, my Jupyter kernel dies. Also when I run the Daft code before running the Dask code.
io_config = IOConfig(s3=S3Config(anonymous=True))
df = daft.read_parquet(
"<s3://dask-data/nyc-taxi/nyc-2015.parquet/part.*.parquet>",
io_config=io_config,
)
df.show()
Is there a syntax difference here that I should be aware of? Happy to create a GH issue if this merits further investigating!
I'm running Daft 0.2.21avril
04/18/2024, 12:56 PMio_config = IOConfig(s3=S3Config(anonymous=True))
df = daft.read_parquet(
"<s3://coiled-datasets/uber-lyft-tlc/>",
io_config=io_config,
)
df.show()
jay
04/18/2024, 4:57 PMavril
04/18/2024, 4:58 PMjay
04/18/2024, 5:00 PMavril
04/18/2024, 5:08 PMjay
04/18/2024, 5:12 PMScanWithTask-LocalLimit [Stage:2]: 0%| | 0/1 [00:00<?, ?it/s]thread '<unnamed>' panicked at 'assertion failed: `(left == right)`
left: `8`,
right: `0`', /Users/runner/.cargo/git/checkouts/arrow2-4f48cbcad4539e8b/c0764b0/src/io/parquet/read/deserialize/primitive/basic.rs:53:9
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
Rayon: detected unexpected panic; aborting
[1] 51086 abort ipython
/Users/jaychia/.pyenv/versions/3.10.8/lib/python3.10/multiprocessing/resource_tracker.py:224: UserWarning: resource_tracker: There appear to be 1 leaked semaphore objects to clean up at shutdown
warnings.warn('resource_tracker: There appear to be %d '
With RUST_BACKTRACE=1
set:
ScanWithTask-LocalLimit [Stage:2]: 0%| | 0/1 [00:00<?, ?it/s]thread '<unnamed>' panicked at 'assertion failed: `(left == right)`
left: `8`,
right: `0`', /Users/runner/.cargo/git/checkouts/arrow2-4f48cbcad4539e8b/c0764b0/src/io/parquet/read/deserialize/primitive/basic.rs:53:9
stack backtrace:
0: _rust_begin_unwind
1: core::panicking::panic_fmt
2: core::panicking::assert_failed_inner
3: core::panicking::assert_failed
4: arrow2::io::parquet::read::deserialize::utils::next
5: <arrow2::io::parquet::read::deserialize::primitive::basic::Iter<T,I,P,F> as core::iter::traits::iterator::Iterator>::next
6: <core::iter::adapters::map::Map<I,F> as core::iter::traits::iterator::Iterator>::next
7: <core::iter::adapters::map::Map<I,F> as core::iter::traits::iterator::Iterator>::next
8: <core::iter::adapters::map::Map<I,F> as core::iter::traits::iterator::Iterator>::next
9: <rayon_core::job::HeapJob<BODY> as rayon_core::job::Job>::execute
10: rayon_core::registry::WorkerThread::wait_until_cold
jay
04/18/2024, 5:14 PM