How Anchor Tokens Transform Sequence Information Compression in LLMs

This section highlights how our anchor-based approach builds on prior work in in-context learning and prompt compression. Unlike task-specific methods like gist tokens, our model universally condenses sequence information into anchor tokens, improving efficiency across tasks. It also contrasts with memory-efficient attention mechanisms like FlashAttention and PagedAttention by targeting sequence compression rather than computational optimization.


This content originally appeared on HackerNoon and was authored by Anchoring

:::info Authors:

(1) Jianhui Pang, from the University of Macau, and work was done when Jianhui Pang and Fanghua Ye were interning at Tencent AI Lab (nlp2ct.pangjh3@gmail.com);

(2) Fanghua Ye, University College London, and work was done when Jianhui Pang and Fanghua Ye were interning at Tencent AI Lab (fanghua.ye.19@ucl.ac.uk);

(3) Derek F. Wong, University of Macau;

(4) Longyue Wang, Tencent AI Lab, and corresponding author.

:::

Abstract and 1 Introduction

2 Related Work

3 Anchor-based Large Language Models

3.1 Background

3.2 Anchor-based Self-Attention Networks

3.3 Anchor-based Inference

4 Experiments and 4.1 Our Implementation

4.2 Data and Training Procedure

4.3 Evaluation

5 Results

6 Analysis

7 Conclusion, Limitations, Ethics Statement, and References

\ A More Experimental Results

B Data Settings

2 Related Work

Our research is inspired by the recent investigation into the understanding of in-context learning (ICL) within LLMs by Wang et al. (2023). In their study, the authors delve into the underlying mechanisms of ICL, emphasizing the influence of label words in demonstration examples on information flow. They reveal that these label words serve as anchors, wherein semantic information converges into these anchors during inference, subsequently directing the LLMs’ final predictions. Motivated by their findings, our objective is to extend this feature to natural language modeling by guiding sequence information compression into manually designed anchor tokens, rather than solely relying on label words. This is crucial because natural language texts may not always contain an explicit label.

\ The most relevant method to our approach in the existing literature is the learning to compress prompts with gist tokens (Mu et al., 2023). Their approach centers around compressing task-specific prompts by fine-tuning the model using the proposed gist masking, thereby enforcing prompt compression. However, there are several crucial divergences between our study and theirs. Unlike their focus on compressing a task prompt, our objective lies in training the LLM to condense sequence information into the anchor tokens. Consequently, our approach can be universally applied to a range of tasks without requiring task-specific training, a feature not shared by gist tokens, as the anchor tokens are seamlessly incorporated into the model’s language modeling. Furthermore, our anchor-based attention masks account for information compression within a sequence and information interaction between sequences, thus extending beyond the mere compression of task prompts.

\ On the other hand, FlashAttention (Dao et al., 2022) and PagedAttention (Kwon et al., 2023) both present memory-efficient attention mechanisms for LLMs. While they focus on optimizing attention computation and subdividing attention processing, our proposed method offers a distinct approach that specifically targets the compression of sequence information into anchor tokens, making it orthogonal to these existing works.

\

:::info This paper is available on arxiv under CC BY 4.0 DEED license.

:::

\


This content originally appeared on HackerNoon and was authored by Anchoring


Print Share Comment Cite Upload Translate Updates
APA

Anchoring | Sciencx (2024-10-10T20:00:49+00:00) How Anchor Tokens Transform Sequence Information Compression in LLMs. Retrieved from https://www.scien.cx/2024/10/10/how-anchor-tokens-transform-sequence-information-compression-in-llms/

MLA
" » How Anchor Tokens Transform Sequence Information Compression in LLMs." Anchoring | Sciencx - Thursday October 10, 2024, https://www.scien.cx/2024/10/10/how-anchor-tokens-transform-sequence-information-compression-in-llms/
HARVARD
Anchoring | Sciencx Thursday October 10, 2024 » How Anchor Tokens Transform Sequence Information Compression in LLMs., viewed ,<https://www.scien.cx/2024/10/10/how-anchor-tokens-transform-sequence-information-compression-in-llms/>
VANCOUVER
Anchoring | Sciencx - » How Anchor Tokens Transform Sequence Information Compression in LLMs. [Internet]. [Accessed ]. Available from: https://www.scien.cx/2024/10/10/how-anchor-tokens-transform-sequence-information-compression-in-llms/
CHICAGO
" » How Anchor Tokens Transform Sequence Information Compression in LLMs." Anchoring | Sciencx - Accessed . https://www.scien.cx/2024/10/10/how-anchor-tokens-transform-sequence-information-compression-in-llms/
IEEE
" » How Anchor Tokens Transform Sequence Information Compression in LLMs." Anchoring | Sciencx [Online]. Available: https://www.scien.cx/2024/10/10/how-anchor-tokens-transform-sequence-information-compression-in-llms/. [Accessed: ]
rf:citation
» How Anchor Tokens Transform Sequence Information Compression in LLMs | Anchoring | Sciencx | https://www.scien.cx/2024/10/10/how-anchor-tokens-transform-sequence-information-compression-in-llms/ |

Please log in to upload a file.




There are no updates yet.
Click the Upload button above to add an update.

You must be logged in to translate posts. Please log in or register.