-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GH-41225: [C#] Slice value buffers when writing sliced list or binary arrays in IPC format #41230
Conversation
|
keyBuilder.Append("0"); | ||
valueBuilder.Append(0); | ||
|
||
if (Length > 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change was needed because otherwise tests that create zero-length arrays would fail when doing a strict comparison, as the values array would be sliced so would not exactly match the input. It seems logical to only add an extra item to the last list if there is actually a list to add to.
The similar change to the ListViewType
above wasn't strictly needed but was just made for consistency.
The ListType
data generation also had the same behaviour before but I've made that generate more complex lists with nulls and a range of list sizes including empty lists, for more confidence that the code is correct.
After merging your PR, Conbench analyzed the 5 benchmarking runs that have been run so far on merge-commit 56186b9. There was 1 benchmark result indicating a performance regression:
The full Conbench report has more details. It also includes information about 1 possible false positive for unstable benchmarks that are known to sometimes produce them. |
… with non-zero offsets (#41259) ### Rationale for this change After #41230 , the integration tests are failing on main . ### What changes are included in this PR? The bit-by-bit comparison branch in the validity bitmap comparison is missing an offset on one side of the comparison. This PR adds that offset back in. ### Are these changes tested? Via the integration test CI job. ### Are there any user-facing changes? No * GitHub Issue: #41258 Authored-by: Dewey Dunnington <[email protected]> Signed-off-by: Dewey Dunnington <[email protected]>
…ero length offsets to IPC format (#41303) ### Rationale for this change Fixes the integration test failures caused by #41230 ### What changes are included in this PR? Only try to access the offset values if the array length is non-zero when writing list and binary arrays to IPC format. ### Are these changes tested? Yes, I've manually run the integration tests with C# and Java to verify they pass (when also including the changes from #41264), and also added new unit tests for this. ### Are there any user-facing changes? This may also be a bug that affects users but it isn't in a released version. * GitHub Issue: #41302 Authored-by: Adam Reeve <[email protected]> Signed-off-by: Curt Hagenlocher <[email protected]>
… arrays in IPC format (#41230) ### Rationale for this change This reduces file sizes when writing sliced binary or list arrays to IPC format. ### What changes are included in this PR? Changes `ArrowStreamWriter` to write only the subset of the values that is needed rather than the full value buffer when writing a `ListArray` or `BinaryArray`, and compute shifted value offset buffers. ### Are these changes tested? This code is covered by existing tests and the change doesn't introduce any difference in the observed array values, so I haven't added new tests or checks. I did change how list arrays are compared though as we can no longer compare the value and value offset buffers directly, so the tests now get list items as arrays and create a new `ArrayComparer` to compare them. This meant that array offsets are no longer always zero, so I've changed the offset assertions to only be used in strict mode. ### Are there any user-facing changes? Yes, this might reduce IPC file sizes for users writing sliced data. * GitHub Issue: #41225 Authored-by: Adam Reeve <[email protected]> Signed-off-by: Curt Hagenlocher <[email protected]>
…ero length offsets to IPC format (#41303) ### Rationale for this change Fixes the integration test failures caused by #41230 ### What changes are included in this PR? Only try to access the offset values if the array length is non-zero when writing list and binary arrays to IPC format. ### Are these changes tested? Yes, I've manually run the integration tests with C# and Java to verify they pass (when also including the changes from #41264), and also added new unit tests for this. ### Are there any user-facing changes? This may also be a bug that affects users but it isn't in a released version. * GitHub Issue: #41302 Authored-by: Adam Reeve <[email protected]> Signed-off-by: Curt Hagenlocher <[email protected]>
…binary arrays in IPC format (apache#41230) ### Rationale for this change This reduces file sizes when writing sliced binary or list arrays to IPC format. ### What changes are included in this PR? Changes `ArrowStreamWriter` to write only the subset of the values that is needed rather than the full value buffer when writing a `ListArray` or `BinaryArray`, and compute shifted value offset buffers. ### Are these changes tested? This code is covered by existing tests and the change doesn't introduce any difference in the observed array values, so I haven't added new tests or checks. I did change how list arrays are compared though as we can no longer compare the value and value offset buffers directly, so the tests now get list items as arrays and create a new `ArrayComparer` to compare them. This meant that array offsets are no longer always zero, so I've changed the offset assertions to only be used in strict mode. ### Are there any user-facing changes? Yes, this might reduce IPC file sizes for users writing sliced data. * GitHub Issue: apache#41225 Authored-by: Adam Reeve <[email protected]> Signed-off-by: Curt Hagenlocher <[email protected]>
…uffers with non-zero offsets (apache#41259) ### Rationale for this change After apache#41230 , the integration tests are failing on main . ### What changes are included in this PR? The bit-by-bit comparison branch in the validity bitmap comparison is missing an offset on one side of the comparison. This PR adds that offset back in. ### Are these changes tested? Via the integration test CI job. ### Are there any user-facing changes? No * GitHub Issue: apache#41258 Authored-by: Dewey Dunnington <[email protected]> Signed-off-by: Dewey Dunnington <[email protected]>
…with zero length offsets to IPC format (apache#41303) ### Rationale for this change Fixes the integration test failures caused by apache#41230 ### What changes are included in this PR? Only try to access the offset values if the array length is non-zero when writing list and binary arrays to IPC format. ### Are these changes tested? Yes, I've manually run the integration tests with C# and Java to verify they pass (when also including the changes from apache#41264), and also added new unit tests for this. ### Are there any user-facing changes? This may also be a bug that affects users but it isn't in a released version. * GitHub Issue: apache#41302 Authored-by: Adam Reeve <[email protected]> Signed-off-by: Curt Hagenlocher <[email protected]>
…binary arrays in IPC format (apache#41230) ### Rationale for this change This reduces file sizes when writing sliced binary or list arrays to IPC format. ### What changes are included in this PR? Changes `ArrowStreamWriter` to write only the subset of the values that is needed rather than the full value buffer when writing a `ListArray` or `BinaryArray`, and compute shifted value offset buffers. ### Are these changes tested? This code is covered by existing tests and the change doesn't introduce any difference in the observed array values, so I haven't added new tests or checks. I did change how list arrays are compared though as we can no longer compare the value and value offset buffers directly, so the tests now get list items as arrays and create a new `ArrayComparer` to compare them. This meant that array offsets are no longer always zero, so I've changed the offset assertions to only be used in strict mode. ### Are there any user-facing changes? Yes, this might reduce IPC file sizes for users writing sliced data. * GitHub Issue: apache#41225 Authored-by: Adam Reeve <[email protected]> Signed-off-by: Curt Hagenlocher <[email protected]>
…uffers with non-zero offsets (apache#41259) ### Rationale for this change After apache#41230 , the integration tests are failing on main . ### What changes are included in this PR? The bit-by-bit comparison branch in the validity bitmap comparison is missing an offset on one side of the comparison. This PR adds that offset back in. ### Are these changes tested? Via the integration test CI job. ### Are there any user-facing changes? No * GitHub Issue: apache#41258 Authored-by: Dewey Dunnington <[email protected]> Signed-off-by: Dewey Dunnington <[email protected]>
…with zero length offsets to IPC format (apache#41303) ### Rationale for this change Fixes the integration test failures caused by apache#41230 ### What changes are included in this PR? Only try to access the offset values if the array length is non-zero when writing list and binary arrays to IPC format. ### Are these changes tested? Yes, I've manually run the integration tests with C# and Java to verify they pass (when also including the changes from apache#41264), and also added new unit tests for this. ### Are there any user-facing changes? This may also be a bug that affects users but it isn't in a released version. * GitHub Issue: apache#41302 Authored-by: Adam Reeve <[email protected]> Signed-off-by: Curt Hagenlocher <[email protected]>
…binary arrays in IPC format (apache#41230) ### Rationale for this change This reduces file sizes when writing sliced binary or list arrays to IPC format. ### What changes are included in this PR? Changes `ArrowStreamWriter` to write only the subset of the values that is needed rather than the full value buffer when writing a `ListArray` or `BinaryArray`, and compute shifted value offset buffers. ### Are these changes tested? This code is covered by existing tests and the change doesn't introduce any difference in the observed array values, so I haven't added new tests or checks. I did change how list arrays are compared though as we can no longer compare the value and value offset buffers directly, so the tests now get list items as arrays and create a new `ArrayComparer` to compare them. This meant that array offsets are no longer always zero, so I've changed the offset assertions to only be used in strict mode. ### Are there any user-facing changes? Yes, this might reduce IPC file sizes for users writing sliced data. * GitHub Issue: apache#41225 Authored-by: Adam Reeve <[email protected]> Signed-off-by: Curt Hagenlocher <[email protected]>
…uffers with non-zero offsets (apache#41259) ### Rationale for this change After apache#41230 , the integration tests are failing on main . ### What changes are included in this PR? The bit-by-bit comparison branch in the validity bitmap comparison is missing an offset on one side of the comparison. This PR adds that offset back in. ### Are these changes tested? Via the integration test CI job. ### Are there any user-facing changes? No * GitHub Issue: apache#41258 Authored-by: Dewey Dunnington <[email protected]> Signed-off-by: Dewey Dunnington <[email protected]>
…with zero length offsets to IPC format (apache#41303) ### Rationale for this change Fixes the integration test failures caused by apache#41230 ### What changes are included in this PR? Only try to access the offset values if the array length is non-zero when writing list and binary arrays to IPC format. ### Are these changes tested? Yes, I've manually run the integration tests with C# and Java to verify they pass (when also including the changes from apache#41264), and also added new unit tests for this. ### Are there any user-facing changes? This may also be a bug that affects users but it isn't in a released version. * GitHub Issue: apache#41302 Authored-by: Adam Reeve <[email protected]> Signed-off-by: Curt Hagenlocher <[email protected]>
…binary arrays in IPC format (apache#41230) ### Rationale for this change This reduces file sizes when writing sliced binary or list arrays to IPC format. ### What changes are included in this PR? Changes `ArrowStreamWriter` to write only the subset of the values that is needed rather than the full value buffer when writing a `ListArray` or `BinaryArray`, and compute shifted value offset buffers. ### Are these changes tested? This code is covered by existing tests and the change doesn't introduce any difference in the observed array values, so I haven't added new tests or checks. I did change how list arrays are compared though as we can no longer compare the value and value offset buffers directly, so the tests now get list items as arrays and create a new `ArrayComparer` to compare them. This meant that array offsets are no longer always zero, so I've changed the offset assertions to only be used in strict mode. ### Are there any user-facing changes? Yes, this might reduce IPC file sizes for users writing sliced data. * GitHub Issue: apache#41225 Authored-by: Adam Reeve <[email protected]> Signed-off-by: Curt Hagenlocher <[email protected]>
…uffers with non-zero offsets (apache#41259) ### Rationale for this change After apache#41230 , the integration tests are failing on main . ### What changes are included in this PR? The bit-by-bit comparison branch in the validity bitmap comparison is missing an offset on one side of the comparison. This PR adds that offset back in. ### Are these changes tested? Via the integration test CI job. ### Are there any user-facing changes? No * GitHub Issue: apache#41258 Authored-by: Dewey Dunnington <[email protected]> Signed-off-by: Dewey Dunnington <[email protected]>
…with zero length offsets to IPC format (apache#41303) ### Rationale for this change Fixes the integration test failures caused by apache#41230 ### What changes are included in this PR? Only try to access the offset values if the array length is non-zero when writing list and binary arrays to IPC format. ### Are these changes tested? Yes, I've manually run the integration tests with C# and Java to verify they pass (when also including the changes from apache#41264), and also added new unit tests for this. ### Are there any user-facing changes? This may also be a bug that affects users but it isn't in a released version. * GitHub Issue: apache#41302 Authored-by: Adam Reeve <[email protected]> Signed-off-by: Curt Hagenlocher <[email protected]>
…binary arrays in IPC format (apache#41230) ### Rationale for this change This reduces file sizes when writing sliced binary or list arrays to IPC format. ### What changes are included in this PR? Changes `ArrowStreamWriter` to write only the subset of the values that is needed rather than the full value buffer when writing a `ListArray` or `BinaryArray`, and compute shifted value offset buffers. ### Are these changes tested? This code is covered by existing tests and the change doesn't introduce any difference in the observed array values, so I haven't added new tests or checks. I did change how list arrays are compared though as we can no longer compare the value and value offset buffers directly, so the tests now get list items as arrays and create a new `ArrayComparer` to compare them. This meant that array offsets are no longer always zero, so I've changed the offset assertions to only be used in strict mode. ### Are there any user-facing changes? Yes, this might reduce IPC file sizes for users writing sliced data. * GitHub Issue: apache#41225 Authored-by: Adam Reeve <[email protected]> Signed-off-by: Curt Hagenlocher <[email protected]>
…uffers with non-zero offsets (apache#41259) ### Rationale for this change After apache#41230 , the integration tests are failing on main . ### What changes are included in this PR? The bit-by-bit comparison branch in the validity bitmap comparison is missing an offset on one side of the comparison. This PR adds that offset back in. ### Are these changes tested? Via the integration test CI job. ### Are there any user-facing changes? No * GitHub Issue: apache#41258 Authored-by: Dewey Dunnington <[email protected]> Signed-off-by: Dewey Dunnington <[email protected]>
…with zero length offsets to IPC format (apache#41303) ### Rationale for this change Fixes the integration test failures caused by apache#41230 ### What changes are included in this PR? Only try to access the offset values if the array length is non-zero when writing list and binary arrays to IPC format. ### Are these changes tested? Yes, I've manually run the integration tests with C# and Java to verify they pass (when also including the changes from apache#41264), and also added new unit tests for this. ### Are there any user-facing changes? This may also be a bug that affects users but it isn't in a released version. * GitHub Issue: apache#41302 Authored-by: Adam Reeve <[email protected]> Signed-off-by: Curt Hagenlocher <[email protected]>
Rationale for this change
This reduces file sizes when writing sliced binary or list arrays to IPC format.
What changes are included in this PR?
Changes
ArrowStreamWriter
to write only the subset of the values that is needed rather than the full value buffer when writing aListArray
orBinaryArray
, and compute shifted value offset buffers.Are these changes tested?
This code is covered by existing tests and the change doesn't introduce any difference in the observed array values, so I haven't added new tests or checks.
I did change how list arrays are compared though as we can no longer compare the value and value offset buffers directly, so the tests now get list items as arrays and create a new
ArrayComparer
to compare them. This meant that array offsets are no longer always zero, so I've changed the offset assertions to only be used in strict mode.Are there any user-facing changes?
Yes, this might reduce IPC file sizes for users writing sliced data.