Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GH-41263: [C#][Integration] Ensure offset is considered in all branches of the bitmap comparison #41264

Merged

Conversation

paleolimbot
Copy link
Member

@paleolimbot paleolimbot commented Apr 17, 2024

Rationale for this change

The optimization for validity buffers was still failing after #41259 (sorry!).

What changes are included in this PR?

There were still two problems:

  • The offset of the actual array was not considered in the "optimized" branch
  • When this offset was considered, it became clear that zero-length arrays were not going to work in that branch

Are these changes tested?

I added the integration workflow to also run for C# additions. This might be a heavy CI job and I'm not sure if you want to keep it there (but running it is useful for this PR to ensure I actually fix things).

For future me (or maybe future others), the integration tests are pretty easy to check:

dotnet build
archery integration --run-c-data --with-csharp=true

Are there any user-facing changes?

No.

Copy link

⚠️ GitHub issue #41263 has been automatically assigned in GitHub to PR creator.

@paleolimbot
Copy link
Member Author

I'm not sure what's going on with the zerolength case, but it seems to be failing for Java and C# producing (with C# consuming):

################# FAILURES #################
FAILED TEST: primitive_zerolength Java producing,  C# consuming
<class 'subprocess.CalledProcessError'>: Command '/arrow/csharp/artifacts/Apache.Arrow.IntegrationTest/Debug/net7.0/Apache.Arrow.IntegrationTest --mode stream-to-file -a /tmp/tmpaibchz2w/28737e0c_generated_primitive_zerolength.consumer_stream_as_file < /tmp/tmpaibchz2w/28737e0c_generated_primitive_zerolength.producer_file_as_stream' returned non-zero exit status 1.

FAILED TEST: union Java producing,  C# consuming
<class 'subprocess.CalledProcessError'>: Command '/arrow/csharp/artifacts/Apache.Arrow.IntegrationTest/Debug/net7.0/Apache.Arrow.IntegrationTest --mode stream-to-file -a /tmp/tmpaibchz2w/3795be39_generated_union.consumer_stream_as_file < /tmp/tmpaibchz2w/3795be39_generated_union.producer_file_as_stream' returned non-zero exit status 1.

FAILED TEST: primitive_zerolength C# producing,  C# consuming
<class 'subprocess.CalledProcessError'>: Command '/arrow/csharp/artifacts/Apache.Arrow.IntegrationTest/Debug/net7.0/Apache.Arrow.IntegrationTest --mode stream-to-file -a /tmp/tmpaibchz2w/fd9d0ef3_0.14.1_primitive_zerolength.gold.consumer_stream_as_file < /arrow/testing/data/arrow-ipc-stream/integration/0.14.1/generated_primitive_zerolength.stream' returned non-zero exit status 1.

@adamreeve
Copy link
Contributor

adamreeve commented Apr 18, 2024

Hi, sorry for breaking this! I didn't think to also run the integration tests.

The Java tests are unrelated to the validity buffer validation but are throwing an error when writing zero length binary arrays to IPC format, due to my more recent changes in #41230. When we create a zero-length binary or list array in C# we get an offsets buffer with length 1, but the arrays read from Java have zero-length offsets.

I've pushed a fix for this a branch in my fork: adamreeve@b2bfb74. Do you want to include that in this PR or should I make a separate one for that?

@paleolimbot
Copy link
Member Author

Do you want to include that in this PR or should I make a separate one for that?

I think that you should do it and I'll rebase! I am not all that confident in my C# 😬

@adamreeve
Copy link
Contributor

I've opened #41303 to fix the issue writing empty binary arrays. With that change plus the ones here, the integration tests all pass for me when running with C#, Java and C++.

CurtHagenlocher pushed a commit that referenced this pull request Apr 19, 2024
…ero length offsets to IPC format (#41303)

### Rationale for this change

Fixes the integration test failures caused by #41230

### What changes are included in this PR?

Only try to access the offset values if the array length is non-zero when writing list and binary arrays to IPC format.

### Are these changes tested?

Yes, I've manually run the integration tests with C# and Java to verify they pass (when also including the changes from #41264), and also added new unit tests for this.

### Are there any user-facing changes?

This may also be a bug that affects users but it isn't in a released version.
* GitHub Issue: #41302

Authored-by: Adam Reeve <[email protected]>
Signed-off-by: Curt Hagenlocher <[email protected]>
@paleolimbot paleolimbot force-pushed the csharp-validity-compare-again branch from 867be6b to 6a9e6b3 Compare April 19, 2024 12:33
@CurtHagenlocher CurtHagenlocher merged commit f8ef09a into apache:main Apr 19, 2024
10 checks passed
@CurtHagenlocher CurtHagenlocher removed the awaiting committer review Awaiting committer review label Apr 19, 2024
@github-actions github-actions bot added the awaiting committer review Awaiting committer review label Apr 19, 2024
@paleolimbot paleolimbot deleted the csharp-validity-compare-again branch April 19, 2024 13:18
Copy link

After merging your PR, Conbench analyzed the 5 benchmarking runs that have been run so far on merge-commit f8ef09a.

There were no benchmark performance regressions. 🎉

The full Conbench report has more details. It also includes information about 43 possible false positives for unstable benchmarks that are known to sometimes produce them.

raulcd pushed a commit that referenced this pull request Apr 29, 2024
…ero length offsets to IPC format (#41303)

### Rationale for this change

Fixes the integration test failures caused by #41230

### What changes are included in this PR?

Only try to access the offset values if the array length is non-zero when writing list and binary arrays to IPC format.

### Are these changes tested?

Yes, I've manually run the integration tests with C# and Java to verify they pass (when also including the changes from #41264), and also added new unit tests for this.

### Are there any user-facing changes?

This may also be a bug that affects users but it isn't in a released version.
* GitHub Issue: #41302

Authored-by: Adam Reeve <[email protected]>
Signed-off-by: Curt Hagenlocher <[email protected]>
tolleybot pushed a commit to tmct/arrow that referenced this pull request May 2, 2024
…with zero length offsets to IPC format (apache#41303)

### Rationale for this change

Fixes the integration test failures caused by apache#41230

### What changes are included in this PR?

Only try to access the offset values if the array length is non-zero when writing list and binary arrays to IPC format.

### Are these changes tested?

Yes, I've manually run the integration tests with C# and Java to verify they pass (when also including the changes from apache#41264), and also added new unit tests for this.

### Are there any user-facing changes?

This may also be a bug that affects users but it isn't in a released version.
* GitHub Issue: apache#41302

Authored-by: Adam Reeve <[email protected]>
Signed-off-by: Curt Hagenlocher <[email protected]>
tolleybot pushed a commit to tmct/arrow that referenced this pull request May 2, 2024
…branches of the bitmap comparison (apache#41264)

### Rationale for this change

The optimization for validity buffers was still failing after apache#41259 (sorry!).

### What changes are included in this PR?

There were still two problems:

- The offset of the actual array was not considered in the "optimized" branch
- When this offset *was* considered, it became clear that zero-length arrays were not going to work in that branch

### Are these changes tested?

I added the integration workflow to also run for C# additions. This might be a heavy CI job and I'm not sure if you want to keep it there (but running it is useful for this PR to ensure I actually fix things).

For future me (or maybe future others), the integration tests are pretty easy to check:

```
dotnet build
archery integration --run-c-data --with-csharp=true
```

### Are there any user-facing changes?

No.
* GitHub Issue: apache#41263

Authored-by: Dewey Dunnington <[email protected]>
Signed-off-by: Curt Hagenlocher <[email protected]>
tolleybot pushed a commit to tmct/arrow that referenced this pull request May 4, 2024
…with zero length offsets to IPC format (apache#41303)

### Rationale for this change

Fixes the integration test failures caused by apache#41230

### What changes are included in this PR?

Only try to access the offset values if the array length is non-zero when writing list and binary arrays to IPC format.

### Are these changes tested?

Yes, I've manually run the integration tests with C# and Java to verify they pass (when also including the changes from apache#41264), and also added new unit tests for this.

### Are there any user-facing changes?

This may also be a bug that affects users but it isn't in a released version.
* GitHub Issue: apache#41302

Authored-by: Adam Reeve <[email protected]>
Signed-off-by: Curt Hagenlocher <[email protected]>
tolleybot pushed a commit to tmct/arrow that referenced this pull request May 4, 2024
…branches of the bitmap comparison (apache#41264)

### Rationale for this change

The optimization for validity buffers was still failing after apache#41259 (sorry!).

### What changes are included in this PR?

There were still two problems:

- The offset of the actual array was not considered in the "optimized" branch
- When this offset *was* considered, it became clear that zero-length arrays were not going to work in that branch

### Are these changes tested?

I added the integration workflow to also run for C# additions. This might be a heavy CI job and I'm not sure if you want to keep it there (but running it is useful for this PR to ensure I actually fix things).

For future me (or maybe future others), the integration tests are pretty easy to check:

```
dotnet build
archery integration --run-c-data --with-csharp=true
```

### Are there any user-facing changes?

No.
* GitHub Issue: apache#41263

Authored-by: Dewey Dunnington <[email protected]>
Signed-off-by: Curt Hagenlocher <[email protected]>
rok pushed a commit to tmct/arrow that referenced this pull request May 8, 2024
…with zero length offsets to IPC format (apache#41303)

### Rationale for this change

Fixes the integration test failures caused by apache#41230

### What changes are included in this PR?

Only try to access the offset values if the array length is non-zero when writing list and binary arrays to IPC format.

### Are these changes tested?

Yes, I've manually run the integration tests with C# and Java to verify they pass (when also including the changes from apache#41264), and also added new unit tests for this.

### Are there any user-facing changes?

This may also be a bug that affects users but it isn't in a released version.
* GitHub Issue: apache#41302

Authored-by: Adam Reeve <[email protected]>
Signed-off-by: Curt Hagenlocher <[email protected]>
rok pushed a commit to tmct/arrow that referenced this pull request May 8, 2024
…branches of the bitmap comparison (apache#41264)

### Rationale for this change

The optimization for validity buffers was still failing after apache#41259 (sorry!).

### What changes are included in this PR?

There were still two problems:

- The offset of the actual array was not considered in the "optimized" branch
- When this offset *was* considered, it became clear that zero-length arrays were not going to work in that branch

### Are these changes tested?

I added the integration workflow to also run for C# additions. This might be a heavy CI job and I'm not sure if you want to keep it there (but running it is useful for this PR to ensure I actually fix things).

For future me (or maybe future others), the integration tests are pretty easy to check:

```
dotnet build
archery integration --run-c-data --with-csharp=true
```

### Are there any user-facing changes?

No.
* GitHub Issue: apache#41263

Authored-by: Dewey Dunnington <[email protected]>
Signed-off-by: Curt Hagenlocher <[email protected]>
rok pushed a commit to tmct/arrow that referenced this pull request May 8, 2024
…with zero length offsets to IPC format (apache#41303)

### Rationale for this change

Fixes the integration test failures caused by apache#41230

### What changes are included in this PR?

Only try to access the offset values if the array length is non-zero when writing list and binary arrays to IPC format.

### Are these changes tested?

Yes, I've manually run the integration tests with C# and Java to verify they pass (when also including the changes from apache#41264), and also added new unit tests for this.

### Are there any user-facing changes?

This may also be a bug that affects users but it isn't in a released version.
* GitHub Issue: apache#41302

Authored-by: Adam Reeve <[email protected]>
Signed-off-by: Curt Hagenlocher <[email protected]>
rok pushed a commit to tmct/arrow that referenced this pull request May 8, 2024
…branches of the bitmap comparison (apache#41264)

### Rationale for this change

The optimization for validity buffers was still failing after apache#41259 (sorry!).

### What changes are included in this PR?

There were still two problems:

- The offset of the actual array was not considered in the "optimized" branch
- When this offset *was* considered, it became clear that zero-length arrays were not going to work in that branch

### Are these changes tested?

I added the integration workflow to also run for C# additions. This might be a heavy CI job and I'm not sure if you want to keep it there (but running it is useful for this PR to ensure I actually fix things).

For future me (or maybe future others), the integration tests are pretty easy to check:

```
dotnet build
archery integration --run-c-data --with-csharp=true
```

### Are there any user-facing changes?

No.
* GitHub Issue: apache#41263

Authored-by: Dewey Dunnington <[email protected]>
Signed-off-by: Curt Hagenlocher <[email protected]>
paleolimbot added a commit that referenced this pull request May 10, 2024
…ration setup (#39302)

### Rationale for this change

The ability to add integration testing was added in nanoarrow however, the infrastructure for running these tests currently lives in the arrow monorepo.

### What changes are included in this PR?

- Added the relevant code to Archery such that these tests can be run
- Added the relevant scripts/environment variables to CI such that these tests run in the integration CI job

### Are these changes tested?

Yes, via the "Integration" CI job.

### Are there any user-facing changes?

No.

This PR still needs #41264 for the integration tests to pass.

* Closes: #39301
* GitHub Issue: #39301

Lead-authored-by: Dewey Dunnington <[email protected]>
Co-authored-by: Dewey Dunnington <[email protected]>
Signed-off-by: Dewey Dunnington <[email protected]>
CurtHagenlocher pushed a commit to CurtHagenlocher/arrow that referenced this pull request May 13, 2024
… integration setup (apache#39302)

### Rationale for this change

The ability to add integration testing was added in nanoarrow however, the infrastructure for running these tests currently lives in the arrow monorepo.

### What changes are included in this PR?

- Added the relevant code to Archery such that these tests can be run
- Added the relevant scripts/environment variables to CI such that these tests run in the integration CI job

### Are these changes tested?

Yes, via the "Integration" CI job.

### Are there any user-facing changes?

No.

This PR still needs apache#41264 for the integration tests to pass.

* Closes: apache#39301
* GitHub Issue: apache#39301

Lead-authored-by: Dewey Dunnington <[email protected]>
Co-authored-by: Dewey Dunnington <[email protected]>
Signed-off-by: Dewey Dunnington <[email protected]>
vibhatha pushed a commit to vibhatha/arrow that referenced this pull request May 25, 2024
…with zero length offsets to IPC format (apache#41303)

### Rationale for this change

Fixes the integration test failures caused by apache#41230

### What changes are included in this PR?

Only try to access the offset values if the array length is non-zero when writing list and binary arrays to IPC format.

### Are these changes tested?

Yes, I've manually run the integration tests with C# and Java to verify they pass (when also including the changes from apache#41264), and also added new unit tests for this.

### Are there any user-facing changes?

This may also be a bug that affects users but it isn't in a released version.
* GitHub Issue: apache#41302

Authored-by: Adam Reeve <[email protected]>
Signed-off-by: Curt Hagenlocher <[email protected]>
vibhatha pushed a commit to vibhatha/arrow that referenced this pull request May 25, 2024
…branches of the bitmap comparison (apache#41264)

### Rationale for this change

The optimization for validity buffers was still failing after apache#41259 (sorry!).

### What changes are included in this PR?

There were still two problems:

- The offset of the actual array was not considered in the "optimized" branch
- When this offset *was* considered, it became clear that zero-length arrays were not going to work in that branch

### Are these changes tested?

I added the integration workflow to also run for C# additions. This might be a heavy CI job and I'm not sure if you want to keep it there (but running it is useful for this PR to ensure I actually fix things).

For future me (or maybe future others), the integration tests are pretty easy to check:

```
dotnet build
archery integration --run-c-data --with-csharp=true
```

### Are there any user-facing changes?

No.
* GitHub Issue: apache#41263

Authored-by: Dewey Dunnington <[email protected]>
Signed-off-by: Curt Hagenlocher <[email protected]>
vibhatha pushed a commit to vibhatha/arrow that referenced this pull request May 25, 2024
… integration setup (apache#39302)

### Rationale for this change

The ability to add integration testing was added in nanoarrow however, the infrastructure for running these tests currently lives in the arrow monorepo.

### What changes are included in this PR?

- Added the relevant code to Archery such that these tests can be run
- Added the relevant scripts/environment variables to CI such that these tests run in the integration CI job

### Are these changes tested?

Yes, via the "Integration" CI job.

### Are there any user-facing changes?

No.

This PR still needs apache#41264 for the integration tests to pass.

* Closes: apache#39301
* GitHub Issue: apache#39301

Lead-authored-by: Dewey Dunnington <[email protected]>
Co-authored-by: Dewey Dunnington <[email protected]>
Signed-off-by: Dewey Dunnington <[email protected]>
JerAguilon pushed a commit to JerAguilon/arrow that referenced this pull request May 29, 2024
… integration setup (apache#39302)

### Rationale for this change

The ability to add integration testing was added in nanoarrow however, the infrastructure for running these tests currently lives in the arrow monorepo.

### What changes are included in this PR?

- Added the relevant code to Archery such that these tests can be run
- Added the relevant scripts/environment variables to CI such that these tests run in the integration CI job

### Are these changes tested?

Yes, via the "Integration" CI job.

### Are there any user-facing changes?

No.

This PR still needs apache#41264 for the integration tests to pass.

* Closes: apache#39301
* GitHub Issue: apache#39301

Lead-authored-by: Dewey Dunnington <[email protected]>
Co-authored-by: Dewey Dunnington <[email protected]>
Signed-off-by: Dewey Dunnington <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants