A Duality Between Downweighted Residual and Restricting Updates In Linear Layers

Reparameterising value and projection parameters in linear layers via the duality between downweighted residuals and restricted updates optimizes learning rates and model performance.


This content originally appeared on HackerNoon and was authored by Auto Encoder: How to Ignore the Signal Noise

:::info Authors:

(1) Bobby He, Department of Computer Science, ETH Zurich (Correspondence to: bobby.he@inf.ethz.ch.);

(2) Thomas Hofmann, Department of Computer Science, ETH Zurich.

:::

Abstract and Introduction

Related Work

Preliminaries

Simplifying Transformer Blocks

Further Experimental Analysis

Discussion, Reproducibility Statement, Acknowledgements and References

A Duality Between Downweighted Residual and Restricting Updates In Linear Layers

B Block Layouts

C Additional Experiments

D Implementation Details

A DUALITY BETWEEN DOWNWEIGHTED RESIDUALS AND RESTRICTING UPDATES IN LINEAR LAYERS

In Sec. 4.1, we motivated our reparameterisation of the value and projection parameters, Eq. (6), through a duality between downweighted residuals branches and restricting parameter updates (materialised through smaller learning rates) in linear layers. This is a relatively simple argument, found elsewhere in the literature e.g. Ding et al. (2023), which we outline here for completeness.

\ We suppose we have a (differentiable) loss function L(W), which is a function of some parameter matrix W. We consider taking a gradient step to minimise L, with learning rate ηW from initialisation W0. This would give new parameters W1:

\

\

\

:::info This paper is available on arxiv under CC 4.0 license.

:::

\


This content originally appeared on HackerNoon and was authored by Auto Encoder: How to Ignore the Signal Noise


Print Share Comment Cite Upload Translate Updates
APA

Auto Encoder: How to Ignore the Signal Noise | Sciencx (2024-06-19T13:00:19+00:00) A Duality Between Downweighted Residual and Restricting Updates In Linear Layers. Retrieved from https://www.scien.cx/2024/06/19/a-duality-between-downweighted-residual-and-restricting-updates-in-linear-layers/

MLA
" » A Duality Between Downweighted Residual and Restricting Updates In Linear Layers." Auto Encoder: How to Ignore the Signal Noise | Sciencx - Wednesday June 19, 2024, https://www.scien.cx/2024/06/19/a-duality-between-downweighted-residual-and-restricting-updates-in-linear-layers/
HARVARD
Auto Encoder: How to Ignore the Signal Noise | Sciencx Wednesday June 19, 2024 » A Duality Between Downweighted Residual and Restricting Updates In Linear Layers., viewed ,<https://www.scien.cx/2024/06/19/a-duality-between-downweighted-residual-and-restricting-updates-in-linear-layers/>
VANCOUVER
Auto Encoder: How to Ignore the Signal Noise | Sciencx - » A Duality Between Downweighted Residual and Restricting Updates In Linear Layers. [Internet]. [Accessed ]. Available from: https://www.scien.cx/2024/06/19/a-duality-between-downweighted-residual-and-restricting-updates-in-linear-layers/
CHICAGO
" » A Duality Between Downweighted Residual and Restricting Updates In Linear Layers." Auto Encoder: How to Ignore the Signal Noise | Sciencx - Accessed . https://www.scien.cx/2024/06/19/a-duality-between-downweighted-residual-and-restricting-updates-in-linear-layers/
IEEE
" » A Duality Between Downweighted Residual and Restricting Updates In Linear Layers." Auto Encoder: How to Ignore the Signal Noise | Sciencx [Online]. Available: https://www.scien.cx/2024/06/19/a-duality-between-downweighted-residual-and-restricting-updates-in-linear-layers/. [Accessed: ]
rf:citation
» A Duality Between Downweighted Residual and Restricting Updates In Linear Layers | Auto Encoder: How to Ignore the Signal Noise | Sciencx | https://www.scien.cx/2024/06/19/a-duality-between-downweighted-residual-and-restricting-updates-in-linear-layers/ |

Please log in to upload a file.




There are no updates yet.
Click the Upload button above to add an update.

You must be logged in to translate posts. Please log in or register.