[Operator] index_add #145

GwokHiujin · 2024-07-31T04:35:12Z

We have completed the development of the index_add operator. Specifically:

The corresponding aTen operator is index_add
Added accuracy test and perf test

tongxin · 2024-08-01T14:53:52Z

src/flag_gems/ops/index_add.py

+        cur_inp = tl.load(inp + inp_off, mask=block_mask, other=0.0).to(tl.float32)
+        src_off = rows_offsets * N + cols_offsets[None, :]
+        cur_src = tl.load(src + src_off, mask=block_mask, other=0.0).to(tl.float32)


Possibly lose precision for fp64 src and inputs?

What about just keep src and inp as-is without casting?

Possibly lose precision for fp64 src and inputs?

I've encountered precision loss issues in some data types (like bf16 and float32). Ignoring casting might lead to problems. I'll implement the suggested changes below and see if they resolve the issue.

iclementine · 2024-08-02T02:52:33Z

src/flag_gems/ops/index_add.py

+    src = dim_compress(src, dim)
+    out = inp.clone()
+
+    grid = lambda meta: (triton.cdiv(M, meta["BLOCK_M"]),)


The input & src is permuted into shapes
input: Shape(M, ...) where product(...) == inp_len
src: Shape(M, ...) where product(...) == N
and contiguous.

So we can view then as
input: Shape(M, inp_len)
src: Shape(M, N)
index: (N, )

Then the task is partitioned along the M dimension in tile size of BLOCK_M, while the N dimension is looped in tiles of size BLOCK_N.

Though it is hard to figure out a general solution now, but permuting the tensor to make the inp_len & N dimensional to be contiguous is not always good.

For example,

input & src are both 2d tensors, now index_add along axis 0, then the permutations are actually not needed to make index_add easier.

Yes, this is a key issue I constantly consider(Since it actually occurs in other operations, too). As a temporary solution, I set conditional judgments, such as: if the input dimension equals (self.ndim - 1), I don't perform the permutation. I'm uncertain if this approach is effective.

BTW Performance testing revealed that permutations can increase latency by about 7 times compared to Torch, making the reduction of unnecessary permutations crucial... ; (

src/flag_gems/ops/index_add.py

iclementine

There are some spaces for optimization, but LGTM.

iclementine

Some suggestions

Ensure index is contiguous, or consider its stride;
keep the data loaded from src as-is to avoid down-cast;
(Maybe) use some heuristics to make a better task partitioning & avoid unnecessary data permutations.

* Use a 2D grid with the kernel * Ensure index is contiguous * Keep the data in kernel loaded from src * Try to avoid some unnecessary permutations

[Operator] init index_add

cdf76e1

GwokHiujin marked this pull request as ready for review July 31, 2024 04:47

GwokHiujin changed the title ~~[Operator] init index_add~~ [Operator] index_add Jul 31, 2024

tongxin reviewed Aug 1, 2024

View reviewed changes

iclementine reviewed Aug 2, 2024

View reviewed changes

src/flag_gems/ops/index_add.py Outdated Show resolved Hide resolved

iclementine reviewed Aug 2, 2024

View reviewed changes

src/flag_gems/ops/index_add.py Outdated Show resolved Hide resolved

iclementine approved these changes Aug 2, 2024

View reviewed changes

iclementine requested changes Aug 2, 2024

View reviewed changes

[fix] resolve index_add

d8a3797

* Use a 2D grid with the kernel * Ensure index is contiguous * Keep the data in kernel loaded from src * Try to avoid some unnecessary permutations

GwokHiujin requested a review from iclementine August 14, 2024 06:32

iclementine self-assigned this Aug 19, 2024

Merge remote-tracking branch 'origin/master' into index_add

4e15f35

iclementine previously approved these changes Aug 23, 2024

View reviewed changes

Merge branch 'master' into index_add

e5f15dd

iclementine dismissed their stale review via e5f15dd August 23, 2024 07:12

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Operator] index_add #145

[Operator] index_add #145

GwokHiujin commented Jul 31, 2024 •

edited

Loading

tongxin Aug 1, 2024

iclementine Aug 2, 2024

GwokHiujin Aug 6, 2024

iclementine Aug 2, 2024

iclementine Aug 2, 2024

GwokHiujin Aug 9, 2024

iclementine left a comment

iclementine left a comment •

edited

Loading

[Operator] index_add #145

Are you sure you want to change the base?

[Operator] index_add #145

Conversation

GwokHiujin commented Jul 31, 2024 • edited Loading

tongxin Aug 1, 2024

Choose a reason for hiding this comment

iclementine Aug 2, 2024

Choose a reason for hiding this comment

GwokHiujin Aug 6, 2024

Choose a reason for hiding this comment

iclementine Aug 2, 2024

Choose a reason for hiding this comment

iclementine Aug 2, 2024

Choose a reason for hiding this comment

GwokHiujin Aug 9, 2024

Choose a reason for hiding this comment

iclementine left a comment

Choose a reason for hiding this comment

iclementine left a comment • edited Loading

Choose a reason for hiding this comment

GwokHiujin commented Jul 31, 2024 •

edited

Loading

iclementine left a comment •

edited

Loading