Skip to content

Commit

Permalink
apacheGH-40560: [Python] RunEndEncodedArray.from_arrays: bugfix for A…
Browse files Browse the repository at this point in the history
…rray arguments (apache#40560) (apache#41093)

### Rationale for this change

The documentation suggests that `RunEndEncodedArray.from_arrays` takes two `Array` parameters, as would be expected of a `from_arrays` method. However, if given an `Array` instance for the `run_ends` parameter, it errors because `Array.__getitem__` returns a pyarrow scalar instead of a native Python integer.

### What changes are included in this PR?

* Handle `Array` parameters for `run_ends` by unconditionally coercing the logical length to a pyarrow scalar, then to a Python native value.

### Are these change tested?

Yes. Augmented the existing unit tests to test with `Array` as well as Python lists, and check that the data types of the `Array` instances correctly carry over to the data type of the `RunEndEncodedArray`.

### Are there any user-facing changes?

Not apart from the bugfix; this was the minimum necessary change to make `Array` parameters work. `RunEndEncodedArray.from_arrays` continues to support e.g. python lists as before.

* GitHub Issue: apache#40560

Authored-by: Hemidark <[email protected]>
Signed-off-by: Joris Van den Bossche <[email protected]>
  • Loading branch information
hemidark authored and vibhatha committed May 25, 2024
1 parent 927f567 commit 510e072
Show file tree
Hide file tree
Showing 2 changed files with 12 additions and 1 deletion.
2 changes: 1 addition & 1 deletion python/pyarrow/array.pxi
Original file line number Diff line number Diff line change
Expand Up @@ -3984,7 +3984,7 @@ cdef class RunEndEncodedArray(Array):
-------
RunEndEncodedArray
"""
logical_length = run_ends[-1] if len(run_ends) > 0 else 0
logical_length = scalar(run_ends[-1]).as_py() if len(run_ends) > 0 else 0
return RunEndEncodedArray._from_arrays(type, True, logical_length,
run_ends, values, 0)

Expand Down
11 changes: 11 additions & 0 deletions python/pyarrow/tests/test_array.py
Original file line number Diff line number Diff line change
Expand Up @@ -3578,12 +3578,23 @@ def check_run_end_encoded_from_arrays_with_type(ree_type=None):
check_run_end_encoded(ree_array, run_ends, values, 19, 4, 0)


def check_run_end_encoded_from_typed_arrays(ree_type):
run_ends = [3, 5, 10, 19]
values = [1, 2, 1, 3]
typed_run_ends = pa.array(run_ends, ree_type.run_end_type)
typed_values = pa.array(values, ree_type.value_type)
ree_array = pa.RunEndEncodedArray.from_arrays(typed_run_ends, typed_values)
assert ree_array.type == ree_type
check_run_end_encoded(ree_array, run_ends, values, 19, 4, 0)


def test_run_end_encoded_from_arrays():
check_run_end_encoded_from_arrays_with_type()
for run_end_type in [pa.int16(), pa.int32(), pa.int64()]:
for value_type in [pa.uint32(), pa.int32(), pa.uint64(), pa.int64()]:
ree_type = pa.run_end_encoded(run_end_type, value_type)
check_run_end_encoded_from_arrays_with_type(ree_type)
check_run_end_encoded_from_typed_arrays(ree_type)


def test_run_end_encoded_from_buffers():
Expand Down

0 comments on commit 510e072

Please sign in to comment.