Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Compiler integration of the Split and Concat operations. #1179

Closed
wants to merge 2 commits into from

Conversation

mdanilow
Copy link

@mdanilow mdanilow commented Sep 9, 2024

The backends of these operations are now implemented as a part of the finn-hlslib repo, the PR is created (Xilinx/finn-hlslib#143).

mdaniowi added 2 commits September 6, 2024 17:55
…e-clock hls modules added, duplicatestreams layer fix in convert_to_hw_layers, insert_fifo fix for the case of last node in the graph being a fork node
… changed after the PR is accepted in the finn-hlslib repo
@mdanilow mdanilow changed the title Compiler integration of the new Split op eration and refactor of the Concat operation. Compiler integration of the Split and Concat operations. Sep 9, 2024
@iksnagreb
Copy link
Contributor

iksnagreb commented Sep 11, 2024

Hey @mdanilow, this looks promising and I am currently trying to integrate the Split operation to streamline and deploy the GLU activation function here, which seems to be successful, at least for a small isolated GLU function in simulation. This comes just at the right time, I was about to build it myself, but this might safe me a lot of time, thank you!

However, I still have a few questions or concerns regarding more generalized split/concat support, in particular aiming at full ONNX compliance:

  • How difficult do you think would it be to extend this to splitting and concatenating along arbitrary axes, not just the last one?
  • The InferSplitLayer transformation seems to target opset version 13 or above, right? That is where the split sizes are given as a second input tensor. QONNX currently still seems to prefer opset version 11 (where the split sizes are given as an attribute). This is not necessarily an issue, you can just manually set the opset version, but maybe we want to have some cleanup transformation converting earlier opsets to the format accepted by your transformation?
  • Wouldn't it make sense to decouple the parallelism at the input and output side of the operators? For example, it seems like a completely unrolled input for the split, i.e., read all elements and distribute them to all output streams within one cycle, is currently not possible? I.e., splitting N elements to N outputs each of size 1 should be possible in a single cycle with SIMD=N, but the current implementation would do this in N cycles with SIMD=1 as SIMD cannot exceed the smallest output size, or do I miss something?

@mdanilow mdanilow closed this Sep 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants