[Dev] Fix a but within FP8 E4M3 Fast Decoding #54

LeiWang1999 · 2024-06-06T16:01:57Z

This pull request primarily focuses on enhancing the functionality of the bitblas Python package and updating the version number. The main changes include the addition of MatmulConfigWithSplitK and MatmulWithSplitK in the bitblas module, updates to the gemv and gemv_dequantize modules to support more iterations, and modifications to the quantization module for better handling of floating point numbers. The version number has also been updated from 0.0.1.dev9 to 0.0.1.dev12.

Version Update:

VERSION and python/bitblas/__init__.py: Updated the version number from 0.0.1.dev9 to 0.0.1.dev12. [1] [2]

Enhancements to bitblas module:

python/bitblas/__init__.py: Imported MatmulConfigWithSplitK and MatmulWithSplitK from general_matmul_splitk module.

Updates to gemv and gemv_dequantize modules:

python/bitblas/gpu/gemv.py: Extended the acceptable range of block_info.iters length to include 4.
python/bitblas/gpu/gemv_dequantize.py: Adjusted the logic in get_vectorize_factor to handle cases where the length of sch.get_loops(block_b) is 4. [1] [2]

Modifications to quantization module:

python/bitblas/quantization/quantization.py: Revised _tir_u8_to_f8_e4m3_to_f16 function and added a new function _tir_u8_to_f8_e4m3_to_f16_naive for better handling of floating point numbers.

Other Changes:

python/bitblas/wrapper/general.py: Modified the legalize_c function to handle cases where dynamic_symbolic_set is not empty.
testing/python/operators/test_general_matmul_splitk_ops.py: Added additional calls to matmul.forward for testing purposes.

…splitk

…into lei/splitk

…lWithSplitK

LeiWang199 and others added 29 commits May 21, 2024 11:51

improve e4m3 decoding.

75d2f3d

Merge branch 'main' of https://github.com/microsoft/BitBLAS into main

dd744d0

append fp16xint1

00bfa31

Update submodule commit reference

8cd8b10

chore: Update shared memory scope for float32 output dtype

9122ff7

BUGFIX: UINT8/INT8 Decoding

b508acc

feat: Add rasterization options for roller module

58d55b7

Refactor tensorcore_legalization method to optimize tensor core usage

e7547ce

feat: Add function to collect variables from expression, improve for …

678a2e1

…splitk

chore: Update typing import in __init__.py

3088b35

chore: Refactor CPU execution of operators

5d206b3

Refactor matmul implementation for splitk layout

e06ce10

Refactor matmul implementation for splitk layout

d67cc6d

Refactor matmul implementation for splitk layout

9e36b6d

chore: Update version to 0.0.1.dev8

e1a0149

chore: Enable debug output in bitblas.set_debug_level()

df0ed7a

Refactor Linear module matmul implementation for splitk layout

a0f651a

Refactor matmul implementation for splitk layout

88295a7

Merge branch 'main' of https://github.com/microsoft/BitBLAS into lei/…

3366dce

…splitk

Refactor CUDA kernel launch string for dynamic symbolic set

25b5c63

Bumpt version to v0.0.1.dev9

26a9f1b

Merge branch 'main' of https://github.com/microsoft/BitBLAS into lei/…

251bf08

…splitk

Refactor CUDA kernel launch string for dynamic symbolic set

e0cf62c

Bump version to v0.0.1.dev10

2e4e8dd

Merge branch 'main' into lei/splitk

0dec7d8

Refactor CUDA kernel launch string for dynamic symbolic set

81f5b9a

Merge branch 'lei/splitk' of https://github.com/LeiWang1999/MSBitBLAS …

ec64f91

…into lei/splitk

Bump version to v0.0.1.dev12 and add MatmulConfigWithSplitK and Matmu…

5e71163

…lWithSplitK

Merge branch 'main' into lei/splitk

d0e0726

LeiWang1999 merged commit c090df6 into microsoft:main Jun 6, 2024
3 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Dev] Fix a but within FP8 E4M3 Fast Decoding #54

[Dev] Fix a but within FP8 E4M3 Fast Decoding #54

LeiWang1999 commented Jun 6, 2024

[Dev] Fix a but within FP8 E4M3 Fast Decoding #54

[Dev] Fix a but within FP8 E4M3 Fast Decoding #54

Conversation

LeiWang1999 commented Jun 6, 2024