Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[AMD] Added instr.sched guards for the FA-like kernels #5163

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

ravil-mobile
Copy link
Contributor

Extended AMDGPU instruction scheduling for the Flash Attention like kernels. The introduced source code changes adds sched.barriers at the beginning and at the end of each scf.For op (called guards) which contains at least 2 tt.Dot, tt.reduce and at least one math::Exp2Op ops. The guards prevent moves of instructions from basic block adjacent to the bodies for for-loops. According to test results, it results in increase performance for the FA kernels due to a reduction of VGPRs spilling.

  • I am not making a trivial change, such as fixing a typo in a comment.

  • I have written a PR description following these
    rules.

  • I have run pre-commit run --from-ref origin/main --to-ref HEAD.

  • Select one of the following.

    • I have added tests.
      • /test for lit tests
      • /unittest for C++ tests
      • /python/test for end-to-end tests
    • This PR does not need a test because I did the source code refactoring. The current tests are supposed to be enough
  • Select one of the following.

    • I have not added any lit tests.
    • The lit tests I have added follow these best practices,
      including the "tests should be minimal" section. (Usually running Python code
      and using the instructions it generates is not minimal.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant