Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GH-41198: [C#] Fix concatenation of union arrays #41226

Merged
merged 2 commits into from
Apr 16, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
22 changes: 15 additions & 7 deletions csharp/src/Apache.Arrow/Arrays/ArrayDataConcatenator.cs
Original file line number Diff line number Diff line change
Expand Up @@ -367,7 +367,7 @@ private ArrowBuffer ConcatenateUnionTypeBuffer()

foreach (ArrayData arrayData in _arrayDataList)
{
builder.Append(arrayData.Buffers[0]);
builder.Append(arrayData.Buffers[0].Span.Slice(arrayData.Offset, arrayData.Length));
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm using the offset here but I haven't tested this works fully with sliced arrays, I'll follow up on this in #41164.

}

return builder.Build(_allocator);
Expand All @@ -376,18 +376,26 @@ private ArrowBuffer ConcatenateUnionTypeBuffer()
private ArrowBuffer ConcatenateUnionOffsetBuffer()
{
var builder = new ArrowBuffer.Builder<int>(_totalLength);
int baseOffset = 0;
var typeCount = _arrayDataList.Count > 0 ? _arrayDataList[0].Children.Length : 0;
var baseOffsets = new int[typeCount];

foreach (ArrayData arrayData in _arrayDataList)
{
ReadOnlySpan<int> span = arrayData.Buffers[1].Span.CastTo<int>();
foreach (int offset in span)
ReadOnlySpan<byte> typeSpan = arrayData.Buffers[0].Span.Slice(arrayData.Offset, arrayData.Length);
ReadOnlySpan<int> offsetSpan = arrayData.Buffers[1].Span.CastTo<int>().Slice(arrayData.Offset, arrayData.Length);
for (int i = 0; i < arrayData.Length; ++i)
{
builder.Append(baseOffset + offset);
var typeId = typeSpan[i];
builder.Append(checked(baseOffsets[typeId] + offsetSpan[i]));
}

// The next offset must start from the current last offset.
baseOffset += span[arrayData.Length];
for (int i = 0; i < typeCount; ++i)
{
checked
{
baseOffsets[i] += arrayData.Children[i].Length;
}
}
}

return builder.Build(_allocator);
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -29,12 +29,6 @@ public void TestStandardCases()
{
foreach ((List<IArrowArray> testTargetArrayList, IArrowArray expectedArray) in GenerateTestData())
{
if (expectedArray is UnionArray)
{
// Union array concatenation is incorrect. See https://github.com/apache/arrow/issues/41198
continue;
}

IArrowArray actualArray = ArrowArrayConcatenator.Concatenate(testTargetArrayList);
ArrowReaderVerifier.CompareArrays(expectedArray, actualArray);
}
Expand Down Expand Up @@ -604,10 +598,11 @@ public void Visit(UnionType type)

for (int j = 0; j < dataList.Count; j++)
{
byte index = (byte)Math.Max(j % 3, 1);
byte index = (byte)Math.Min(j % 3, 1);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm pretty sure this was the intended behaviour, otherwise the index == 0 branch below is never used.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the original code, the indexes would be 1, 1, 2, 1, 1, 2, etc.. With this change, the indexes would be 0, 1, 1, 0, 1, 1, etc.

int? intValue = (index == 1) ? dataList[j] : null;
string stringValue = (index == 1) ? null : dataList[j]?.ToString();
typeBuilder.Append(index);
typeResultBuilder.Append(index);

if (isDense)
{
Expand Down
Loading