Deriving the DPO Objective Under the Plackett-Luce Model Post date August 25, 2024 Post author By Writings, Papers and Blogs on Text Models Post categories In ai-fine-tuning, bradley-terry-model, direct-preference-optimization, language-model-optimization, language-models, plackett-luce-model, reinforcement-learning, reward-modeling