Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Profiling the node(s) #14

Open
arkhadem opened this issue Jan 11, 2024 · 8 comments
Open

Profiling the node(s) #14

arkhadem opened this issue Jan 11, 2024 · 8 comments

Comments

@arkhadem
Copy link

Hi,

I need to profile the microarchitecture for some HPC applications. I aim to profile microarchitectural events such as cache hit/miss rate. Based on my understanding, I should use the AMD uProf profiler. Would you please let me know if we have access to this profiler in the HPC cloud or not, and if yes, how I can access it?

Thank you in advance

@tom-papatheodore
Copy link
Collaborator

Hey Alireza-

I don't believe we currently have uProf on the cluster, but we can potentially install it so it's available as an environment module. Is it the CPUs or GPUs you're looking to profile? If the latter, you can use either rocprof or omniperf. Let me know what you need and we can go from there.

-Tom

@arkhadem
Copy link
Author

Hi Tom,

Thanks for getting back to me.

I have some HPC applications implemented with MPI and OMP on the CPU and HIP on the GPU. Honestly, I am new to the AMD world, and I do not have any experience in profiling AMD hardware. I am looking for a profiler like Intel Vtune for AMD CPU and Nvidia Nsight Compute Profiler for AMD GPU. I am looking for the MPI overhead, program hotspots (top-down analysis), detailed performance counters like cache hit rate and branch predictor miss rate (and MPKI), memory bandwidth and latency, utilization, etc. Based on my brief research, I found AMD uProf for CPU and AMD Radeon GPU profiler.

Hence, I appreciate any insights on the profiler, as well as installing them on the HPC servers as a module.

Thank you very much for your time and consideration.

  • Alireza

@arkhadem
Copy link
Author

arkhadem commented Feb 7, 2024

Hi Tom,

Do you have any updates on this issue? My research is blocked by the need for the profilers. I would appreciate it if you install the tools as a module and let me know how I should access them.

Sincerely,

  • Alireza

@tom-papatheodore
Copy link
Collaborator

Hey Alireza-

omniperf, omnitrace, and rocprof are the AMD counterparts to NVIDIA's NSightCompute, NSightSystems, and nvprof, respectively. omniperf is currently available on the cluster as an environment module, and rocprof is installed as part of ROCm, so you should be able to get started with these tools now.

@koomie Can we install omnitrace and uprof on the cluster?

Here are the relevant docs to help you get started, Alireza:

-Tom

@koomie
Copy link
Collaborator

koomie commented Feb 7, 2024

FYI, omniperf uses rocprof under the covers to access a variety of hardware counters (it will run your application multiple times to be able to gather a range of counters on a per-gpu kernel basis). I suspect this is probably the tool you want to start with.

@arkhadem
Copy link
Author

Hi @tom-papatheodore and @koomie,
I found the rocprof under the rocm module and I think that would be enough for GPU. Thanks for sending the links, they are comprehensive and useful.

But for CPU profiling, I think I need the uProf still. Would you let me know what is the status of the uProf installation?

@arkhadem
Copy link
Author

arkhadem commented Mar 5, 2024

Hi @tom-papatheodore and @koomie,

Do you have any updates on this?

Best,

  • Alireza

@koomie
Copy link
Collaborator

koomie commented Jul 10, 2024

Yes, and apologies for the delay. We have installed uProf across the system. There is no module for it yet, but you can access the binaries directly at: /opt/AMDuProf_4.2-850/bin/

As Tom mentioned, Omniperf is a good tool for detailed single-node GPU analysis with hardware counters, and you can access via the pre-installed modules on the system (e.g. module load omniperf).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants