-
Notifications
You must be signed in to change notification settings - Fork 34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Dev][TL] Implement MMA INT4 Tensor Core and Correctness Test Case. #232
Merged
Commits on Oct 4, 2024
-
Configuration menu - View commit details
-
Copy full SHA for acb4aa4 - Browse repository at this point
Copy the full SHA acb4aa4View commit details
Commits on Nov 1, 2024
-
Configuration menu - View commit details
-
Copy full SHA for e305a72 - Browse repository at this point
Copy the full SHA e305a72View commit details -
Configuration menu - View commit details
-
Copy full SHA for 4aa081c - Browse repository at this point
Copy the full SHA 4aa081cView commit details -
Configuration menu - View commit details
-
Copy full SHA for 811e5c7 - Browse repository at this point
Copy the full SHA 811e5c7View commit details -
Configuration menu - View commit details
-
Copy full SHA for 4a0afc9 - Browse repository at this point
Copy the full SHA 4a0afc9View commit details -
Refactor tensor core memory allocation in MatmulFineGrainScheduler
- Adjusted the local fragment sizes for tensor core memory allocation in the MatmulFineGrainScheduler class. - Updated the allocation sizes for A_local, B_local, and C_local variables based on the new fragment sizes. - The changes ensure efficient memory utilization and improve performance. Refactor tensor core memory allocation in MatmulDequantizeFineGrainedScheduler - Modified the fragment sizes for tensor core memory allocation in the MatmulDequantizeFineGrainedScheduler class. - Updated the allocation sizes for A_frag, B_frag, and C_frag variables based on the new fragment sizes. - The changes optimize memory usage and enhance the efficiency of the dequantization process. Refactor tensor core memory allocation in MatmulDequantizeWeightPropagationScheduler - Adjusted the fragment sizes for tensor core memory allocation in the MatmulDequantizeWeightPropagationScheduler class. - Updated the allocation sizes for A_frag, B_frag, B_dequantize_frag, and C_frag variables based on the new fragment sizes. - The changes improve memory utilization and optimize the weight propagation process.
Configuration menu - View commit details
-
Copy full SHA for d2f7fcb - Browse repository at this point
Copy the full SHA d2f7fcbView commit details -
Configuration menu - View commit details
-
Copy full SHA for 2af586d - Browse repository at this point
Copy the full SHA 2af586dView commit details -
Configuration menu - View commit details
-
Copy full SHA for fd4973c - Browse repository at this point
Copy the full SHA fd4973cView commit details -
Configuration menu - View commit details
-
Copy full SHA for 8f7767b - Browse repository at this point
Copy the full SHA 8f7767bView commit details
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.