How can I enable PagedAttention for Llama-3-8B in vLLM? #8883

Blueblack319 · 2024-09-27T03:26:43Z

Blueblack319
Sep 27, 2024

I’m running the Llama-3-8B model in vLLM and checked the nsys report. According to the report, neither paged_attention_v1_kernel nor paged_attention_v2_kernel was launched. I verified this by inspecting which attention backend is used in the get_attn_backend() function, where I found that is_blocksparse is always set to false. How can I enable PagedAttention for the Llama-3-8B model?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How can I enable PagedAttention for Llama-3-8B in vLLM? #8883

{{title}}

Replies: 0 comments

Select a reply

How can I enable PagedAttention for Llama-3-8B in vLLM? #8883

Blueblack319 Sep 27, 2024

Replies: 0 comments

Blueblack319
Sep 27, 2024