Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[C++] Extract the kernel loops used for PrimitiveTakeExec and generalize to any fixed-width type #41301

Closed
3 tasks done
felipecrv opened this issue Apr 19, 2024 · 1 comment
Closed
3 tasks done

Comments

@felipecrv
Copy link
Contributor

felipecrv commented Apr 19, 2024

Describe the enhancement requested

The last step is made much easier by using the utilities introduced in #41297 and requires that the loop learn to handle fixed byte-widths that are only known at runtime. Currently PrimitiveTakeExec can only handle powers of 2 widths from 1 to 32 as template specialization parameters.

I have all of this working locally on my machine. I "just" need to clean it up for review.

Component(s)

C++

@felipecrv felipecrv self-assigned this Apr 19, 2024
felipecrv added a commit that referenced this issue May 3, 2024
…ixed-size lists (#41297)

### Rationale for this change

Introduce utilities for dealing with fixed-width types (including fixed-size lists of fixed-width types) generically. And use it for initial optimizations of `Take` and `Filter`.

### What changes are included in this PR?

- [x] Introduce utilities for dealing with fixed-width types generically
- [x] Use faster `Take` kernel on small power-of-2 byte widths of fixed-width types
  - [x] from `FSLTakeExec` (including FSLs of FSBs)
  - [x] from `FSBTakeExec` (done before this PR)
- [x] ~Take on any fixed-width type~ (as a separate issue #41301)
- [x] Use faster `Filter` kernel on both primitive and fixed-width types of any length
   - [x] from `FSLFilterExec` (including FSLs of FSBs)
   - [x] from `FSBFilterExec` (done before this PR)

### Are these changes tested?

By existing and new tests.

### Are there any user-facing changes?

Some functions added to the `arrow::util` namespace and documented inline.
* GitHub Issue: #39798

Authored-by: Felipe Oliveira Carvalho <[email protected]>
Signed-off-by: Felipe Oliveira Carvalho <[email protected]>
vibhatha pushed a commit to vibhatha/arrow that referenced this issue May 25, 2024
…sted fixed-size lists (apache#41297)

### Rationale for this change

Introduce utilities for dealing with fixed-width types (including fixed-size lists of fixed-width types) generically. And use it for initial optimizations of `Take` and `Filter`.

### What changes are included in this PR?

- [x] Introduce utilities for dealing with fixed-width types generically
- [x] Use faster `Take` kernel on small power-of-2 byte widths of fixed-width types
  - [x] from `FSLTakeExec` (including FSLs of FSBs)
  - [x] from `FSBTakeExec` (done before this PR)
- [x] ~Take on any fixed-width type~ (as a separate issue apache#41301)
- [x] Use faster `Filter` kernel on both primitive and fixed-width types of any length
   - [x] from `FSLFilterExec` (including FSLs of FSBs)
   - [x] from `FSBFilterExec` (done before this PR)

### Are these changes tested?

By existing and new tests.

### Are there any user-facing changes?

Some functions added to the `arrow::util` namespace and documented inline.
* GitHub Issue: apache#39798

Authored-by: Felipe Oliveira Carvalho <[email protected]>
Signed-off-by: Felipe Oliveira Carvalho <[email protected]>
felipecrv added a commit that referenced this issue Jun 10, 2024
…nd generalize to any fixed-width type (#41373)

### Rationale for this change

I want to instantiate this primitive operation in other scenarios (e.g. the optimized version of `Take` that handles chunked arrays) and extend the sub-classes of `GatherCRTP` with different member functions that re-use the `WriteValue` function generically (any fixed-width type and even bit-wide booleans).

When taking these improvements to `Filter` I will also re-use the "gather" concept and parameterize it by bitmaps/boolean-arrays instead of selection vectors (`indices`) like `take` does. So gather is not a "renaming of take" but rather a generalization of `take` and `filter` do in Arrow with different representations of what should be gathered from the `values` array.

### What changes are included in this PR?

 - Introduce the Gather class helper to delegate fixed-width memory gathering: both static and dynamically sized (size known at compile time or size known at runtime)
 - Specialized `Take` implementation for values/indices without nulls
 - Fold the Boolean, Primitives, and Fixed-Width Binary implementation of `Take` into a single one
 - Skip validity bitmap allocation when inputs (values and indices) have no nulls
 
### Are these changes tested?

 - Existing tests
 - New test assertions that check that `Take` guarantees null values are zeroed out

* GitHub Issue: #41301

Authored-by: Felipe Oliveira Carvalho <[email protected]>
Signed-off-by: Felipe Oliveira Carvalho <[email protected]>
@felipecrv
Copy link
Contributor Author

Issue resolved by pull request 41373
#41373

@felipecrv felipecrv added this to the 17.0.0 milestone Jun 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant