DPO Hyperparameters and Implementation Details

This section offers a practical guide to implementing Direct Preference Optimization (DPO) in PyTorch for training language models. It includes essential parameters, such as a default learning rate of 1e-6 with linear warmup and a β value of 0.1 or 0.5, optimized for tasks like TL;DR summarization. The provided code and configurations make it easy to integrate DPO into existing model training workflows.


This content originally appeared on HackerNoon and was authored by Writings, Papers and Blogs on Text Models

:::info Authors:

(1) Rafael Rafailo, Stanford University and Equal contribution; more junior authors listed earlier;

(2) Archit Sharma, Stanford University and Equal contribution; more junior authors listed earlier;

(3) Eric Mitchel, Stanford University and Equal contribution; more junior authors listed earlier;

(4) Stefano Ermon, CZ Biohub;

(5) Christopher D. Manning, Stanford University;

(6) Chelsea Finn, Stanford University.

:::

Abstract and 1. Introduction

2 Related Work

3 Preliminaries

4 Direct Preference Optimization

5 Theoretical Analysis of DPO

6 Experiments

7 Discussion, Acknowledgements, and References

Author Contributions

\ A Mathematical Derivations

A.1 Deriving the Optimum of the KL-Constrained Reward Maximization Objective

A.2 Deriving the DPO Objective Under the Bradley-Terry Model

A.3 Deriving the DPO Objective Under the Plackett-Luce Model

A.4 Deriving the Gradient of the DPO Objective and A.5 Proof of Lemma 1 and 2

A.6 Proof of Theorem 1

\ B DPO Implementation Details and Hyperparameters

\ C Further Details on the Experimental Set-Up and C.1 IMDb Sentiment Experiment and Baseline Details

C.2 GPT-4 prompts for computing summarization and dialogue win rates

C.3 Unlikelihood baseline

\ D Additional Empirical Results

D.1 Performance of Best of N baseline for Various N and D.2 Sample Responses and GPT-4 Judgments

D.3 Human study details

B DPO Implementation Details and Hyperparameters

DPO is relatively straightforward to implement; PyTorch code for the DPO loss is provided below:

\

\ Unless noted otherwise, we use a β = 0.1, batch size of 64 and the RMSprop optimizer with a learning rate of 1e-6 by default. We linearly warmup the learning rate from 0 to 1e-6 over 150 steps. For TL;DR summarization, we use β = 0.5, while rest of the parameters remain the same.

\

:::info This paper is available on arxiv under CC BY-NC-ND 4.0 DEED license.

:::

\


This content originally appeared on HackerNoon and was authored by Writings, Papers and Blogs on Text Models


Print Share Comment Cite Upload Translate Updates
APA

Writings, Papers and Blogs on Text Models | Sciencx (2024-08-26T20:30:19+00:00) DPO Hyperparameters and Implementation Details. Retrieved from https://www.scien.cx/2024/08/26/dpo-hyperparameters-and-implementation-details/

MLA
" » DPO Hyperparameters and Implementation Details." Writings, Papers and Blogs on Text Models | Sciencx - Monday August 26, 2024, https://www.scien.cx/2024/08/26/dpo-hyperparameters-and-implementation-details/
HARVARD
Writings, Papers and Blogs on Text Models | Sciencx Monday August 26, 2024 » DPO Hyperparameters and Implementation Details., viewed ,<https://www.scien.cx/2024/08/26/dpo-hyperparameters-and-implementation-details/>
VANCOUVER
Writings, Papers and Blogs on Text Models | Sciencx - » DPO Hyperparameters and Implementation Details. [Internet]. [Accessed ]. Available from: https://www.scien.cx/2024/08/26/dpo-hyperparameters-and-implementation-details/
CHICAGO
" » DPO Hyperparameters and Implementation Details." Writings, Papers and Blogs on Text Models | Sciencx - Accessed . https://www.scien.cx/2024/08/26/dpo-hyperparameters-and-implementation-details/
IEEE
" » DPO Hyperparameters and Implementation Details." Writings, Papers and Blogs on Text Models | Sciencx [Online]. Available: https://www.scien.cx/2024/08/26/dpo-hyperparameters-and-implementation-details/. [Accessed: ]
rf:citation
» DPO Hyperparameters and Implementation Details | Writings, Papers and Blogs on Text Models | Sciencx | https://www.scien.cx/2024/08/26/dpo-hyperparameters-and-implementation-details/ |

Please log in to upload a file.




There are no updates yet.
Click the Upload button above to add an update.

You must be logged in to translate posts. Please log in or register.