NUMA Aware allocations #9444

mjkpolo · 2024-09-11T23:54:29Z

mjkpolo
Sep 11, 2024

What threads are responsible for calling ggml_backend_tensor_alloc? I can see from using --numa distribute that tok/sec increases, so something is happening, but it's hard to tell how many threads there are and what is actually calling ggml_backend_tensor_alloc , since my understanding is cudaMallocHost itself isn't NUMA aware, the thread calling it must be.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

NUMA Aware allocations #9444

{{title}}

Replies: 0 comments

Select a reply

NUMA Aware allocations #9444

mjkpolo Sep 11, 2024

Replies: 0 comments

mjkpolo
Sep 11, 2024