[Initilization] refactor docs (#9)

microsoft · Mar 11, 2024 · 4d6a997 · 4d6a997
1 parent 94dd027
commit 4d6a997
Show file tree

Hide file tree

Showing 2 changed files with 5 additions and 14 deletions.
diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md
@@ -7,14 +7,7 @@ That would be awesome if you want to contribute something to BitBLAS!
   - [Asking Questions](contributing.md#asking-questions)
   - [Submitting Pull Requests](contributing.md#submitting-pull-requests)
   - [Repository Setup](contributing.md#repository-setup)
-  - [Running Examples](contributing.md#running-examples)
   - [Running Tests](contributing.md#running-tests)
-  - [Testing Input Methods](contributing.md#testing-input-methods)
-  - [Publishing Releases](contributing.md#publishing-releases)
-    - [Publishing Normal `@latest` Release](contributing.md#publishing-normal-latest-release)
-    - [Publishing `@next` Release](contributing.md#publishing-next-release)
-    - [Publishing `@experimental` Release](contributing.md#publishing-experimental-release)
-    - [Running Prerelease Script](contributing.md#running-prerelease-script)
 
 ## Reporting Bugs
 

diff --git a/README.md b/README.md
@@ -7,14 +7,11 @@ Some of the key features of BitBLAS include:
   - High Performance (Not only FP16xFP16, INT8xINT8, but also FP16xINT4/2/1, INT8xINT4/2/1).
   - With the flexible DSL (TIR Script) to effortlessly craft domain-specific kernels for your situations.
   - Support with dynamic symbolic throuth tvm unity -> generate source code with dynamic shape.
-
-Latest News 🔥
-
-- 2023-03-03: BitBLAS first proposed int8xint1 gemv/gemm with 10x/2x speedup over float16xfloat16 on A100, please checkout [op_benchmark_a100_int1_scaling](images/figures/op_benchmark_a100_int1_scaling.png) for detailed input scaling benchmark results.
+  - BitBLAS first proposed int8xint1 gemv/gemm with 10x/2x speedup over float16xfloat16 on A100, please checkout [op_benchmark_a100_int1_scaling](images/figures/op_benchmark_a100_int1_scaling.png) for detailed input scaling benchmark results.
 
 
 ## Benchmark
-BitBLAS can achieve optimal performance across various compute pattern:
+BitBLAS can achieve optimal performance across various compute patterns:
 
 - GTX 3090
   - FLOAT16xFLOAT16 with TensorCore ![3090-gemm-fp16](./images/figures/op_benchmark_3090_fp16_gemm.png)
@@ -52,5 +49,6 @@ This project may contain trademarks or logos for projects, products, or services
 ## Acknowledgement
 
 We learned a lot from the following projects.
-- [Apache TVM](https://github.com/apache/tvm): We use TensorIR as our DSL currently, and we customized tvm from unity branch to support some features we needed.
-- [Microsoft Roller](https://github.com/microsoft/nnfusion/tree/roller): The design and algo inspiration of hardware aware tuning comes from Roller.
+
+- [Apache TVM](https://github.com/apache/tvm): BitBLAS havs adopted TensorIR as our DSL. Additionally, we have customized TVM from the unity branch to incorporate specific features that were required for our project.
+- [Microsoft Roller](https://github.com/microsoft/nnfusion/tree/roller): The design and algo inspiration of hardware aware tuning in BitBLAS comes from Roller,.