[WIP]Add ncu report analyzer #2497

FindHao · 2024-10-08T23:47:51Z

This PR adds a ncu report analyzer to analyze the profiled ncu report. It also adds two metrics memory_traffic and arithmetic_intensity. To avoid excessive profiling overhead, we only profile with necessary ncu metrics.

This PR is a part of operator benchmarking plan

Example commands:

python run_benchmark.py triton --op fp8_gemm --num-inputs 1  --metrics ncu_rep,memory_traffic,arithmetic_intensity

Example output:

  0%|                                                                                                                                                                                                                                                              | 0/1 [00:00<?, ?it/s]==PROF== Connected to process 1289285 (/scratch/yhao/miniconda3/envs/pta_gil/bin/python3.10)
  0%|                                                                                                                                                                                                                                                              | 0/1 [00:00<?, ?it/s]==PROF== Profiling "sm90_xmma_gemm_e4m3f16_e4m3f3..." - 0: 0%....50%....100% - 3 passes
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:02<00:00,  2.34s/it]
             x_val    torch_fp8_gemm-_ncu_trace_in_task
------------------  -----------------------------------
(1024, 1024, 1024)                              success
==PROF== Disconnected from process 1289285
==WARNING== No source files were imported. Check that the target application was compiled with -lineinfo.
==PROF== Report: /scratch/yhao/tmp/tritonbench/fp8_gemm/ncu_traces/torch_fp8_gemm_0/ncu_output.ncu-rep
==PROF== Connected to process 1289431 (/scratch/yhao/miniconda3/envs/pta_gil/bin/python3.10)
  0%|                                                                                                                                                                                                                                                              | 0/1 [00:00<?, ?it/s]==PROF== Profiling "matmul_kernel" - 0: 0%....50%....100% - 3 passes
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:05<00:00,  5.25s/it]
             x_val    triton_fp8_gemm-_ncu_trace_in_task
------------------  ------------------------------------
(1024, 1024, 1024)                               success
==PROF== Disconnected from process 1289431
==PROF== Report: /scratch/yhao/tmp/tritonbench/fp8_gemm/ncu_traces/triton_fp8_gemm_0/ncu_output.ncu-rep
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:14<00:00, 14.40s/it]
             x_val    torch_fp8_gemm-arithmetic_intensity    torch_fp8_gemm-memory_traffic                                                                 torch_fp8_gemm-ncu_rep    triton_fp8_gemm-arithmetic_intensity    triton_fp8_gemm-memory_traffic                                                                 triton_fp8_gemm-ncu_rep
------------------  -------------------------------------  -------------------------------  -------------------------------------------------------------------------------------  --------------------------------------  --------------------------------  --------------------------------------------------------------------------------------
(1024, 1024, 1024)              (1.3621756724589384, 0.0)               (2150656.0, 256.0)  /scratch/yhao/tmp/tritonbench/fp8_gemm/ncu_traces/torch_fp8_gemm_0/ncu_output.ncu-rep                              (0.0, 0.0)                  (2116096.0, 0.0)  /scratch/yhao/tmp/tritonbench/fp8_gemm/ncu_traces/triton_fp8_gemm_0/ncu_output.ncu-rep

FindHao · 2024-10-09T22:41:03Z

torchbenchmark/_components/ncu/analyzer.py

+    import ncu_report
+
+    # save all kernels' metrics. {metric_name: [kernel1_metric_value, kernel2_metric_value, ...]}
+    results = defaultdict(list)


@xuzhao9
Any suggestions on how we should save this data? We need to keep the metric results for each kernel, but we also need aggregated results, right? For example, the memory traffic (both read and write) for the whole operator should be the sum of all kernels' read and write traffic.

@xuzhao9 @eellison
Do you think the arithmetic intensity of the whole operator can be represented as a weighted average based on execution time?

add ncu analyzer

6e005d0

facebook-github-bot added the cla signed label Oct 8, 2024

FindHao temporarily deployed to docker-s3-upload October 8, 2024 23:47 — with GitHub Actions Inactive

FindHao had a problem deploying to docker-s3-upload October 8, 2024 23:47 — with GitHub Actions Failure

FindHao temporarily deployed to docker-s3-upload October 8, 2024 23:47 — with GitHub Actions Inactive

FindHao mentioned this pull request Oct 8, 2024

OperatorBench Plan pytorch/pytorch#136168

Open

6 tasks

update the metric results

2219090

FindHao had a problem deploying to docker-s3-upload October 9, 2024 22:32 — with GitHub Actions Failure

FindHao temporarily deployed to docker-s3-upload October 9, 2024 22:32 — with GitHub Actions Inactive

FindHao commented Oct 9, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP]Add ncu report analyzer #2497

[WIP]Add ncu report analyzer #2497

FindHao commented Oct 8, 2024 •

edited

Loading

FindHao Oct 9, 2024

FindHao Oct 9, 2024

[WIP]Add ncu report analyzer #2497

Are you sure you want to change the base?

[WIP]Add ncu report analyzer #2497

Conversation

FindHao commented Oct 8, 2024 • edited Loading

FindHao Oct 9, 2024

Choose a reason for hiding this comment

FindHao Oct 9, 2024

Choose a reason for hiding this comment

FindHao commented Oct 8, 2024 •

edited

Loading