[FP8] Support Weight Dequantize FP16xFP8_E4M3 #42
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This pull request primarily focuses on expanding the functionality of the existing codebase to include support for new formats and simplifying the existing code. The most significant changes include the addition of new formats (FP8_E4M3, FP_E5M2) in the
check_weight_decode_info
function, simplification of code ingeneral_matmul.py
, and the addition of new conversion functions inquantization.py
.Addition of new formats:
python/bitblas/gpu/gemv_dequantize.py
: Updated thecheck_weight_decode_info
function to include the new formats "fp_e5m2" and "fp_e4m3" in the list of acceptable formats. [1] [2]python/bitblas/gpu/matmul_mma_dequantize.py
: Similar changes were made in this file to include the new format "fp_e4m3". [1] [2] [3]Code simplification:
python/bitblas/ops/general_matmul.py
: Multiple changes were made in this file to simplify the code. These changes primarily involve reducing the number of lines of code by combining statements and removing unnecessary parentheses. [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12]Addition of new conversion functions:
python/bitblas/quantization/quantization.py
: New conversion functions_tir_u8_to_f8_e4m3_to_f16
and_tir_u8_to_f8_e5m2_to_f16
were added to support the new formats. [1] [2]