[FP8] Support Weight Dequantize FP16xFP8_E4M3 #42

LeiWang1999 · 2024-05-20T14:50:44Z

This pull request primarily focuses on expanding the functionality of the existing codebase to include support for new formats and simplifying the existing code. The most significant changes include the addition of new formats (FP8_E4M3, FP_E5M2) in the check_weight_decode_info function, simplification of code in general_matmul.py, and the addition of new conversion functions in quantization.py.

Addition of new formats:

python/bitblas/gpu/gemv_dequantize.py: Updated the check_weight_decode_info function to include the new formats "fp_e5m2" and "fp_e4m3" in the list of acceptable formats. [1] [2]
python/bitblas/gpu/matmul_mma_dequantize.py: Similar changes were made in this file to include the new format "fp_e4m3". [1] [2] [3]

Code simplification:

python/bitblas/ops/general_matmul.py: Multiple changes were made in this file to simplify the code. These changes primarily involve reducing the number of lines of code by combining statements and removing unnecessary parentheses. [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12]

Addition of new conversion functions:

python/bitblas/quantization/quantization.py: New conversion functions _tir_u8_to_f8_e4m3_to_f16 and _tir_u8_to_f8_e5m2_to_f16 were added to support the new formats. [1] [2]

LeiWang199 added 3 commits May 20, 2024 14:47

Refactor quantization module to support new float8 formats

aa658de

Refactor quantization module to support new float8 formats

b165b58

update readme

e45bc8d

LeiWang1999 merged commit 42f379c into microsoft:main May 20, 2024
3 checks passed

LeiWang1999 deleted the dev/fp8 branch July 6, 2024 08:14

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FP8] Support Weight Dequantize FP16xFP8_E4M3 #42

[FP8] Support Weight Dequantize FP16xFP8_E4M3 #42

LeiWang1999 commented May 20, 2024

[FP8] Support Weight Dequantize FP16xFP8_E4M3 #42

[FP8] Support Weight Dequantize FP16xFP8_E4M3 #42

Conversation

LeiWang1999 commented May 20, 2024