Independent Science + Technology

Training speed on longer sequences

The research paper compares training speeds across different model sizes and sequence lengths to conclude the computational advantages of Hawk and Griffin.

Post date January 14, 2025
Post categories In ai-models, deep-learning, hawk-and-griffin-models, language-models, nlp-research, rnn-models, scalable-ai, transformers

This content originally appeared on HackerNoon and was authored by Gating

:::info Authors:

(1) Soham De, Google DeepMind and with Equal contributions;

(2) Samuel L. Smith, Google DeepMind and with Equal contributions;

(3) Anushan Fernando, Google DeepMind and with Equal contributions;

(4) Aleksandar Botev, Google DeepMind and with Equal contributions;

(5) George Cristian-Muraru, Google DeepMind and with Equal contributions;

(6) Albert Gu, Work done while at Google DeepMind;

(7) Ruba Haroun, Google DeepMind;

(8) Leonard Berrada, Google DeepMind;

(9) Yutian Chen, Google DeepMind;

(10) Srivatsan Srinivasan, Google DeepMind;

(11) Guillaume Desjardins, Google DeepMind;

(12) Arnaud Doucet, Google DeepMind;

(13) David Budden, Google DeepMind;

(14) Yee Whye Teh, Google DeepMind;

(15) David Budden, Google DeepMind;

(16) Razvan Pascanu, Google DeepMind;

(17) Nando De Freitas, Google DeepMind;

(18) Caglar Gulcehre, Google DeepMind.

:::

Table of Links

2 Model Architecture

3 Recurrent Models Scale as Efficiently as Transformers

3.1. Scaling curves

3.2. Evaluation on downstream tasks

4 Training Recurrent Models Efficiently on Device and 4.1. Model parallelism for large scale training

4.2. Efficient linear recurrences on device

4.3. Training speed on longer sequences

5. Inference Speed

5.1. A simple model of the decode step

6. Long Context Modeling and 6.1. Improving next token prediction with longer contexts

6.2. Copy and retrieval capabilities

7. Related Works

8. Conclusion, Acknowledgements, and References

\ A. RG-LRU Recurrence Gate

B. Complex-Gated Linear Recurrent Unit (CG-LRU)

C. Model Scale Hyper-Parameters

D. Efficient Linear Recurrences on Device

E. The Local Attention Window Size of Griffin

F. Inference Speeds

G. Improving Next Token Prediction with Longer Contexts: Additional Results

H. Additional Details of the Copy and Retrieval Tasks

4.3. Training speed on longer sequences

\

:::info This paper is available on arxiv under CC BY 4.0 DEED license.

:::

\

This content originally appeared on HackerNoon and was authored by Gating

Print Share Comment Cite Upload Translate Updates

APA

Gating | Sciencx (2025-01-14T16:00:03+00:00) Training speed on longer sequences. Retrieved from https://www.scien.cx/2025/01/14/training-speed-on-longer-sequences/

MLA

" » Training speed on longer sequences." Gating | Sciencx - Tuesday January 14, 2025, https://www.scien.cx/2025/01/14/training-speed-on-longer-sequences/

HARVARD

Gating | Sciencx Tuesday January 14, 2025 » Training speed on longer sequences., viewed ,<https://www.scien.cx/2025/01/14/training-speed-on-longer-sequences/>

VANCOUVER

Gating | Sciencx - » Training speed on longer sequences. [Internet]. [Accessed ]. Available from: https://www.scien.cx/2025/01/14/training-speed-on-longer-sequences/

CHICAGO

" » Training speed on longer sequences." Gating | Sciencx - Accessed . https://www.scien.cx/2025/01/14/training-speed-on-longer-sequences/

IEEE

" » Training speed on longer sequences." Gating | Sciencx [Online]. Available: https://www.scien.cx/2025/01/14/training-speed-on-longer-sequences/. [Accessed: ]

rf:citation

» Training speed on longer sequences | Gating | Sciencx | https://www.scien.cx/2025/01/14/training-speed-on-longer-sequences/ |

Please log in to upload a file.

There are no updates yet.
Click the Upload button above to add an update.

You must be logged in to translate posts. Please log in or register.

8 Machine Learning Trends that Impact Business in 2021 and Beyond

Post date September 8, 2021
Post author By MobiDev
Post categories In ai, deep-learning, good-company, machine-learning, machine-learning-benefits, machine-learning-uses, machinelearning, technology

Griffin Model: Advancing Copying and Retrieval in AI Tasks

Post date January 14, 2025
Post author By Gating
Post categories In ai-extrapolation, copying-tasks, deep-learning, efficient-ai, griffin-model, language-models, retrieval-tasks, transformers

Claude’s Latest Version is EPIC for Programmers

Post date March 3, 2025
Post author By This Week in AI Engineering
Post categories In ai-agent, ai-models, claude, claude-ai, openai, perplexity-ai, programming, software-development

Automatic Liver Segmentation — Part 3/4: Common Errors

Post date January 20, 2022
Post author By Mohammed El Amine Mokhtari
Post categories In ai, computer-vision, deep-learning, healthcare, medical-imaging