vllm-project / vllm Public

Notifications You must be signed in to change notification settings
Fork 4k
Star 27k

Code
Issues 1.6k
Pull requests 411
Discussions
Actions
Security
Insights

Additional navigation options

Code
Issues
Pull requests
Discussions
Actions
Security
Insights

Issues: vllm-project/vllm

[Roadmap] vLLM Roadmap Q3 2024

#5805 opened Jun 25, 2024 by simon-mo

Open 39

[RFC]: Reimplement and separate beam search on top of vLLM core

#8306 opened Sep 9, 2024 by youkaichao

Open 6

Labels 49 Milestones 0

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

1,575 Open 2,922 Closed

Author

Filter by author

Label

Filter by label

Use alt + click/return to exclude labels

or ⇧ + click/return for logical OR

Projects

Filter by project

Milestones

Filter by milestone

Assignee

Filter by who’s assigned

Assigned to nobody

Sort

Sort by

Newest Oldest Most commented Least commented Recently updated Least recently updated Best match

Most reactions

Issues list

[Bug]: Using FlashInfer with FP8 model with FP8 KV cache produces an error bug

Something isn't working

#8641 opened Sep 19, 2024 by Syst3m1cAn0maly

1 task done

[Performance]: The accept rate of typical acceptance sampling performance

Performance-related issues

#8639 opened Sep 19, 2024 by hustxiayang

1 task done

[Bug]: loading embedding model intfloat/e5-mistral-7b-instruct results in a bind error bug

Something isn't working

#8638 opened Sep 19, 2024 by nickandbro

1 task done

[Usage]: Ray + vLLM OpenAI (offline) Batch Inference usage

How to use vllm

#8636 opened Sep 19, 2024 by mbuet2ner

1 task done

[Feature]: OpenAI o1-like Chain-of-thought (CoT) inference workflow feature request

#8633 opened Sep 19, 2024 by kozuch

1 task done

[Misc]: How are quantized models loaded compared to non-quantized models? misc

#8632 opened Sep 19, 2024 by gnpinkert

[Feature]: Online Inference on local model with OpenAI Python SDK feature request

#8631 opened Sep 19, 2024 by pesc101

1 task done

[Bug]: OpenGVLab/InternVL2-Llama3-76B: view size is not compatible with input tensor's size and stride bug

Something isn't working

#8630 opened Sep 19, 2024 by erkintelnyx

[Bug]: memory leak bug

Something isn't working

#8629 opened Sep 19, 2024 by wciq1208

1 task done

[Bug]: Speculative decoding interferes with CPU-only execution bug

Something isn't working

#8628 opened Sep 19, 2024 by NickLucche

1 task done

[Bug]: MistralTokenizer Detokenization Issue bug

Something isn't working

#8627 opened Sep 19, 2024 by ywang96

1 task done

[Usage]: doesn't work on pascal tesla P100 usage

How to use vllm

#8626 opened Sep 19, 2024 by Stargate256

1 task done

[Bug]: Wrong "completion_tokens" counts in streaming usage bug

Something isn't working

#8625 opened Sep 19, 2024 by yuhon0528

1 task done

qwen2-vl: AttributeError: '_OpNamespace' '_C' object has no attribute 'gelu_quick' bug

Something isn't working

#8624 opened Sep 19, 2024 by xiangxinhello

1 task done

[Feature]: Output logps of given output feature request

#8622 opened Sep 19, 2024 by lycheeyolo

1 task done

[Bug]: vllm deploy medusa, draft acceptance rate: 0.000 bug

Something isn't working

#8620 opened Sep 19, 2024 by xhjcxxl

[Usage]: Number of requests currently in the queue usage

How to use vllm

#8617 opened Sep 19, 2024 by shubh9m

1 task done

[Misc]: In vllm, I tested that the speed of concurrent server api requests is greater than the speed of offline inference. I would like to ask if there are any performance tests on the official vllm website. Can you tell me? Thank you. misc

#8610 opened Sep 19, 2024 by lwdnxu

1 task done

[Usage]: Standalone Debugging and Measuring the vLLM Engine Backend usage

How to use vllm

#8586 opened Sep 19, 2024 by htang2012

1 task done

[Usage]: How to run VLLM on multiple tpu hosts V4-32 usage

How to use vllm

#8582 opened Sep 18, 2024 by sparsh35

1 task done

[Feature]: DRY Sampling feature request

#8581 opened Sep 18, 2024 by Shreyansh1311

1 task done

[Bug]: Wrong Response with Gemma2 with 8k context length bug

Something isn't working

#8580 opened Sep 18, 2024 by hahmad2008

[Bug]: lm-format-enforcer guided decoding kills MQLLMEngine bug

Something isn't working

#8578 opened Sep 18, 2024 by joerunde

1 task done

[Usage]: usage

How to use vllm

#8569 opened Sep 18, 2024 by lauhaide

1 task done

[Feature]: Offline quantization for Pixtral-12B feature request

#8566 opened Sep 18, 2024 by KohakuBlueleaf

1 task done

Previous 1 2 3 4 5 … 62 63 Next

Previous Next

ProTip! Follow long discussions with comments:>50.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly