How can I enable PagedAttention for Llama-3-8B in vLLM? #8883
Unanswered
Blueblack319
asked this question in
Q&A
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
I’m running the Llama-3-8B model in vLLM and checked the nsys report. According to the report, neither paged_attention_v1_kernel nor paged_attention_v2_kernel was launched. I verified this by inspecting which attention backend is used in the get_attn_backend() function, where I found that is_blocksparse is always set to false. How can I enable PagedAttention for the Llama-3-8B model?
Beta Was this translation helpful? Give feedback.
All reactions