Deriving the DPO Objective Under the Plackett-Luce Model

The Plackett-Luce model [30, 21] is a generalization of the Bradley-Terry model over rankings (rather than just pair-wise comparisons). Similar to to the Bradley-Terry model, it stipulates that when presented with a set of possible choices, people prefer a choice with probability proportional to the value of some latent reward function for that choice. In our context, when presented with a prompt x and a set of K answers y1, . . . , yK a user would output a permutation τ : [K] → [K], giving their ranking of the answers. The Plackett-Luce model stipulates that

\ Notice that when K = 2, Equation 18 reduces to the Bradley-Terry model. However, for the general Plackett-Luce model, we can still utilize the results of Eq. 5 and substitute the reward function parameterized by its optimal policy. Similarly to Appendix A.2, the normalization constant Z(x) cancels out and we’re left with:

:::info This paper is available on arxiv under CC BY-NC-ND 4.0 DEED license.

:::

This content originally appeared on HackerNoon and was authored by Writings, Papers and Blogs on Text Models

Print Share Comment Cite Upload Translate Updates

APA

Writings, Papers and Blogs on Text Models | Sciencx (2024-08-25T21:14:29+00:00) Deriving the DPO Objective Under the Plackett-Luce Model. Retrieved from https://www.scien.cx/2024/08/25/deriving-the-dpo-objective-under-the-plackett-luce-model/

MLA

" » Deriving the DPO Objective Under the Plackett-Luce Model." Writings, Papers and Blogs on Text Models | Sciencx - Sunday August 25, 2024, https://www.scien.cx/2024/08/25/deriving-the-dpo-objective-under-the-plackett-luce-model/

HARVARD

Writings, Papers and Blogs on Text Models | Sciencx Sunday August 25, 2024 » Deriving the DPO Objective Under the Plackett-Luce Model., viewed ,<https://www.scien.cx/2024/08/25/deriving-the-dpo-objective-under-the-plackett-luce-model/>

VANCOUVER

Writings, Papers and Blogs on Text Models | Sciencx - » Deriving the DPO Objective Under the Plackett-Luce Model. [Internet]. [Accessed ]. Available from: https://www.scien.cx/2024/08/25/deriving-the-dpo-objective-under-the-plackett-luce-model/

CHICAGO

" » Deriving the DPO Objective Under the Plackett-Luce Model." Writings, Papers and Blogs on Text Models | Sciencx - Accessed . https://www.scien.cx/2024/08/25/deriving-the-dpo-objective-under-the-plackett-luce-model/

IEEE

" » Deriving the DPO Objective Under the Plackett-Luce Model." Writings, Papers and Blogs on Text Models | Sciencx [Online]. Available: https://www.scien.cx/2024/08/25/deriving-the-dpo-objective-under-the-plackett-luce-model/. [Accessed: ]

rf:citation

» Deriving the DPO Objective Under the Plackett-Luce Model | Writings, Papers and Blogs on Text Models | Sciencx | https://www.scien.cx/2024/08/25/deriving-the-dpo-objective-under-the-plackett-luce-model/ |

Please log in to upload a file.

There are no updates yet.
Click the Upload button above to add an update.

You must be logged in to translate posts. Please log in or register.

Table of Links

A.3 Deriving the DPO Objective Under the Plackett-Luce Model

Related Posts