Where does In-context Translation Happen in Large Language Models: Inference Efficiency

In this study, researchers attempt to characterize the region where large language models transition from in-context learners to translation models.


This content originally appeared on HackerNoon and was authored by Computational Technology for All

:::info Authors:

(1) Suzanna Sia, Johns Hopkins University;

(2) David Mueller;

(3) Kevin Duh.

:::

5. Inference Efficiency

Speeding up transformer inference is of great interest to the community (Fournier et al., 2023). We highlight the potential of speeding up inference time as a direct consequence of identifying where task recognition occurs in the model and redundancy of self-attention processing. Our results indicate that we can achieve significant speedups in inference by removing the processing of context-tokens all-together after a certain point in the model, with little to no impact on downstream performance.

\

\ Then, for a model with nℓ layers, the amount of processing in terms of speed and memory saved is approximately (nℓ − r)/nℓ × (k/k + 1).

\ Using the example of LLAMA7B (32 layers), we see from Figure 2 that the model is very close to it’s ceiling score after processing the examples at layer 14 (ℓ = 14). If we no longer need to process examples after ℓ = 14, under a prompt size of 5 the savings are approximately 45%.

\ For instruction-tuned models which are typically deployed in production, even if we assume that no examples are provided, savings can be non-trivial as very long-form instructions are typically provided to the model in an attempt to control it’s behavior (prompt engineering).

\

:::info This paper is available on arxiv under CC BY 4.0 DEED license.

:::

\


This content originally appeared on HackerNoon and was authored by Computational Technology for All


Print Share Comment Cite Upload Translate Updates
APA

Computational Technology for All | Sciencx (2024-08-30T17:00:24+00:00) Where does In-context Translation Happen in Large Language Models: Inference Efficiency. Retrieved from https://www.scien.cx/2024/08/30/where-does-in-context-translation-happen-in-large-language-models-inference-efficiency/

MLA
" » Where does In-context Translation Happen in Large Language Models: Inference Efficiency." Computational Technology for All | Sciencx - Friday August 30, 2024, https://www.scien.cx/2024/08/30/where-does-in-context-translation-happen-in-large-language-models-inference-efficiency/
HARVARD
Computational Technology for All | Sciencx Friday August 30, 2024 » Where does In-context Translation Happen in Large Language Models: Inference Efficiency., viewed ,<https://www.scien.cx/2024/08/30/where-does-in-context-translation-happen-in-large-language-models-inference-efficiency/>
VANCOUVER
Computational Technology for All | Sciencx - » Where does In-context Translation Happen in Large Language Models: Inference Efficiency. [Internet]. [Accessed ]. Available from: https://www.scien.cx/2024/08/30/where-does-in-context-translation-happen-in-large-language-models-inference-efficiency/
CHICAGO
" » Where does In-context Translation Happen in Large Language Models: Inference Efficiency." Computational Technology for All | Sciencx - Accessed . https://www.scien.cx/2024/08/30/where-does-in-context-translation-happen-in-large-language-models-inference-efficiency/
IEEE
" » Where does In-context Translation Happen in Large Language Models: Inference Efficiency." Computational Technology for All | Sciencx [Online]. Available: https://www.scien.cx/2024/08/30/where-does-in-context-translation-happen-in-large-language-models-inference-efficiency/. [Accessed: ]
rf:citation
» Where does In-context Translation Happen in Large Language Models: Inference Efficiency | Computational Technology for All | Sciencx | https://www.scien.cx/2024/08/30/where-does-in-context-translation-happen-in-large-language-models-inference-efficiency/ |

Please log in to upload a file.




There are no updates yet.
Click the Upload button above to add an update.

You must be logged in to translate posts. Please log in or register.