Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
apacheGH-41317: [C++] Fix crash on invalid Parquet file (apache#41366)
### Rationale for this change Fixes the crash detailed in apache#41317 in TableBatchReader::ReadNext() on a corrupted Parquet file ### What changes are included in this PR? Add a validation that all read columns have the same size ### Are these changes tested? I've tested on the reproducer I provided in apache#41317 that it now triggers a clean error: ``` Traceback (most recent call last): File "test.py", line 3, in <module> [_ for _ in parquet_file.iter_batches()] File "test.py", line 3, in <listcomp> [_ for _ in parquet_file.iter_batches()] File "pyarrow/_parquet.pyx", line 1587, in iter_batches File "pyarrow/error.pxi", line 91, in pyarrow.lib.check_status pyarrow.lib.ArrowInvalid: columns do not have the same size ``` I'm not sure if/how unit tests for corrupted datasets should be added ### Are there any user-facing changes? No **This PR contains a "Critical Fix".** * GitHub Issue: apache#41317 Authored-by: Even Rouault <[email protected]> Signed-off-by: mwish <[email protected]>
- Loading branch information