[Dev][TL] Hardware Aware Tuning Examples with TL #201

LeiWang1999 · 2024-09-29T10:31:45Z

This pull request includes several changes to improve the scheduling and tuning capabilities in the bitblas module, along with some code refactoring and cleanup. The most important changes include updating the ThreadPoolExecutor usage, adding hardware-aware configuration methods, introducing a new fine-grained matrix multiplication scheduler, and making various code style improvements.

Enhancements to Scheduling and Tuning:

bitblas/base/utils.py: Changed the ThreadPoolExecutor to use a dynamic number of workers based on the max_workers parameter.
bitblas/ops/base_scheduler.py: Added a method to get hardware-aware configurations for matrix multiplication schedulers.
bitblas/ops/general_matmul/tilelang/dense/matmul_tensorcore.py: Added methods to get hardware-aware configurations for CUDA architectures.

New Scheduler Introduction:

bitblas/ops/general_matmul/tilelang/dense/matmul_simt.py: Introduced MatmulFineGrainSIMTScheduler, a new fine-grained matrix multiplication scheduler.

Code Refactoring and Cleanup:

bitblas/ops/operator.py: Refactored multiple methods for better readability and maintainability, including apply_fast_tuning, hardware_aware_finetune, and _build_default_module. [1] [2] [3]
bitblas/ops/general_matmul/tilelang/dense/matmul.py to matmul_tensorcore.py: Renamed the file for better clarity and organization.

…y function

…ps_dynamic

LeiWang1999 added 10 commits September 28, 2024 07:43

Refactor tilelang dequantize module and add matmul_blocked_weight_onl…

f3b1eb9

…y function

remove un-implemented code.

730d13e

Implement BaseScheduler to wrap some related items.

8047ee7

lint fix

64db065

test skip

cef04a8

Refactor tilelang dequantize module and add matmul_blocked_weight_onl…

f1652e9

…y function

Merge branch 'main' of https://github.com/microsoft/BitBLAS into tl_o…

4f6c545

…ps_dynamic

test fix

c485b68

hardware tuning demo

ebe42a6

Merge branch 'main' of https://github.com/microsoft/BitBLAS into tl_o…

88230ec

…ps_dynamic

LeiWang1999 merged commit 5af67f7 into microsoft:main Sep 29, 2024
5 of 6 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Dev][TL] Hardware Aware Tuning Examples with TL #201

[Dev][TL] Hardware Aware Tuning Examples with TL #201

LeiWang1999 commented Sep 29, 2024

[Dev][TL] Hardware Aware Tuning Examples with TL #201

[Dev][TL] Hardware Aware Tuning Examples with TL #201

Conversation

LeiWang1999 commented Sep 29, 2024

Enhancements to Scheduling and Tuning:

New Scheduler Introduction:

Code Refactoring and Cleanup: