Large Language Models on Memory-Constrained Devices Using Flash Memory: Results for Falcon 7B Model

To verify that our findings generalize beyond OPT models we also apply the idea of LLM in flash to Falcon model. Since, the base line Falcon model is not sparse, we used a sparsified (relufied) version with almost the same performance as that of the base version (Mirzadeh et al., 2023). Similar to previous section, we present the results obtained under the condition that approximately half of the model size is available for use in DRAM.

\ Predictors. In the Falcon 7B model, predictors of rank r = 256 are used for the initial 28 layers, and r = 1152 for the last four layers.

\ Window Configuration. Our model reserves memory for a window containing the last 4 tokens. This setup utilizes 33% of the Feed Forward Network (FFN). In terms of memory allocation, embeddings take 4.2% of the model size, attention weights account for 19.4%, and predictors require 4%. The active portion of the FFN, given our window size, is 25.3% (calculated as 0.33 × 76.8). Overall, this amounts to 52.93% of the model’s total size.

:::info This paper is available on arxiv under CC BY-SA 4.0 DEED license.

:::

This content originally appeared on HackerNoon and was authored by Knapsack

Print Share Comment Cite Upload Translate Updates

APA

Knapsack | Sciencx (2024-07-31T20:00:18+00:00) Large Language Models on Memory-Constrained Devices Using Flash Memory: Results for Falcon 7B Model. Retrieved from https://www.scien.cx/2024/07/31/large-language-models-on-memory-constrained-devices-using-flash-memory-results-for-falcon-7b-model/

MLA

" » Large Language Models on Memory-Constrained Devices Using Flash Memory: Results for Falcon 7B Model." Knapsack | Sciencx - Wednesday July 31, 2024, https://www.scien.cx/2024/07/31/large-language-models-on-memory-constrained-devices-using-flash-memory-results-for-falcon-7b-model/

HARVARD

Knapsack | Sciencx Wednesday July 31, 2024 » Large Language Models on Memory-Constrained Devices Using Flash Memory: Results for Falcon 7B Model., viewed ,<https://www.scien.cx/2024/07/31/large-language-models-on-memory-constrained-devices-using-flash-memory-results-for-falcon-7b-model/>

VANCOUVER

Knapsack | Sciencx - » Large Language Models on Memory-Constrained Devices Using Flash Memory: Results for Falcon 7B Model. [Internet]. [Accessed ]. Available from: https://www.scien.cx/2024/07/31/large-language-models-on-memory-constrained-devices-using-flash-memory-results-for-falcon-7b-model/

CHICAGO

" » Large Language Models on Memory-Constrained Devices Using Flash Memory: Results for Falcon 7B Model." Knapsack | Sciencx - Accessed . https://www.scien.cx/2024/07/31/large-language-models-on-memory-constrained-devices-using-flash-memory-results-for-falcon-7b-model/

IEEE

" » Large Language Models on Memory-Constrained Devices Using Flash Memory: Results for Falcon 7B Model." Knapsack | Sciencx [Online]. Available: https://www.scien.cx/2024/07/31/large-language-models-on-memory-constrained-devices-using-flash-memory-results-for-falcon-7b-model/. [Accessed: ]

rf:citation

» Large Language Models on Memory-Constrained Devices Using Flash Memory: Results for Falcon 7B Model | Knapsack | Sciencx | https://www.scien.cx/2024/07/31/large-language-models-on-memory-constrained-devices-using-flash-memory-results-for-falcon-7b-model/ |

Please log in to upload a file.

There are no updates yet.
Click the Upload button above to add an update.

You must be logged in to translate posts. Please log in or register.

Table of Links

4.2 Results for Falcon 7B Model

Related Posts