How Effective is vLLM When a Prefix Is Thrown Into the Mix?

\ Figure 16. Translation workload where the input prompts share a common prefix. The prefix includes (a) 1 example with 80 tokens or (b) 5 examples with 341 tokens.

\ Figure 17. Performance on chatbot workload.

\ Fig. 10. For the model, we use LLaMA-13B [52], which is multilingual. For the workload, we use the WMT16 [4] Englishto-German translation dataset and synthesize two prefixes that include an instruction and a few translation examples. The first prefix includes a single example (i.e., one-shot) while the other prefix includes 5 examples (i.e., few-shot). As shown in Fig. 16 (a), vLLM achieves 1.67× higher throughput than Orca (Oracle) when the one-shot prefix is shared. Furthermore, when more examples are shared (Fig. 16 (b)), vLLM achieves 3.58× higher throughput than Orca (Oracle).

:::info This paper is available on arxiv under CC BY 4.0 DEED license.

:::

:::info Authors:

(1) Woosuk Kwon, UC Berkeley with Equal contribution;

(2) Zhuohan Li, UC Berkeley with Equal contribution;

(3) Siyuan Zhuang, UC Berkeley;

(4) Ying Sheng, UC Berkeley and Stanford University;

(5) Lianmin Zheng, UC Berkeley;

(6) Cody Hao Yu, Independent Researcher;

(7) Cody Hao Yu, Independent Researcher;

(8) Joseph E. Gonzalez, UC Berkeley;

(9) Hao Zhang, UC San Diego;

(10) Ion Stoica, UC Berkeley.

:::

This content originally appeared on HackerNoon and was authored by Writings, Papers and Blogs on Text Models

Print Share Comment Cite Upload Translate Updates

APA

Writings, Papers and Blogs on Text Models | Sciencx (2025-01-04T16:00:04+00:00) How Effective is vLLM When a Prefix Is Thrown Into the Mix?. Retrieved from https://www.scien.cx/2025/01/04/how-effective-is-vllm-when-a-prefix-is-thrown-into-the-mix/

MLA

" » How Effective is vLLM When a Prefix Is Thrown Into the Mix?." Writings, Papers and Blogs on Text Models | Sciencx - Saturday January 4, 2025, https://www.scien.cx/2025/01/04/how-effective-is-vllm-when-a-prefix-is-thrown-into-the-mix/

HARVARD

Writings, Papers and Blogs on Text Models | Sciencx Saturday January 4, 2025 » How Effective is vLLM When a Prefix Is Thrown Into the Mix?., viewed ,<https://www.scien.cx/2025/01/04/how-effective-is-vllm-when-a-prefix-is-thrown-into-the-mix/>

VANCOUVER

Writings, Papers and Blogs on Text Models | Sciencx - » How Effective is vLLM When a Prefix Is Thrown Into the Mix?. [Internet]. [Accessed ]. Available from: https://www.scien.cx/2025/01/04/how-effective-is-vllm-when-a-prefix-is-thrown-into-the-mix/

CHICAGO

" » How Effective is vLLM When a Prefix Is Thrown Into the Mix?." Writings, Papers and Blogs on Text Models | Sciencx - Accessed . https://www.scien.cx/2025/01/04/how-effective-is-vllm-when-a-prefix-is-thrown-into-the-mix/

IEEE

" » How Effective is vLLM When a Prefix Is Thrown Into the Mix?." Writings, Papers and Blogs on Text Models | Sciencx [Online]. Available: https://www.scien.cx/2025/01/04/how-effective-is-vllm-when-a-prefix-is-thrown-into-the-mix/. [Accessed: ]

rf:citation

» How Effective is vLLM When a Prefix Is Thrown Into the Mix? | Writings, Papers and Blogs on Text Models | Sciencx | https://www.scien.cx/2025/01/04/how-effective-is-vllm-when-a-prefix-is-thrown-into-the-mix/ |

Please log in to upload a file.

There are no updates yet.
Click the Upload button above to add an update.

You must be logged in to translate posts. Please log in or register.

Table of Links

6.4 Shared prefix

Related Posts