Zephyr: Direct Distillation of LM Alignment: Appendix

This content originally appeared on HackerNoon and was authored by Writings, Papers and Blogs on Text Models

:::info Authors:

(1) Lewis Tunstall, Equal contribution and The H4 (Helpful, Honest, Harmless, Huggy) Team (email: lewis@huggingface.co);

(2) Edward Beeching, Equal contribution and The H4 (Helpful, Honest, Harmless, Huggy) Team;

(3) Nathan Lambert, The H4 (Helpful, Honest, Harmless, Huggy) Team;

(4) Nazneen Rajani, The H4 (Helpful, Honest, Harmless, Huggy) Team;

(5) Kashif Rasul, The H4 (Helpful, Honest, Harmless, Huggy) Team;

(6) Younes Belkada, The H4 (Helpful, Honest, Harmless, Huggy) Team;

(7) Shengyi Huang, The H4 (Helpful, Honest, Harmless, Huggy) Team;

(8) Leandro von Werra, The H4 (Helpful, Honest, Harmless, Huggy) Team;

(9) Clementine Fourrier, The H4 (Helpful, Honest, Harmless, Huggy) Team;

(10) Nathan Habib, The H4 (Helpful, Honest, Harmless, Huggy) Team;

(11) Nathan Sarrazin, The H4 (Helpful, Honest, Harmless, Huggy) Team;

(12) Omar Sanseviero, The H4 (Helpful, Honest, Harmless, Huggy) Team;

(13) Alexander M. Rush, The H4 (Helpful, Honest, Harmless, Huggy) Team;

(14) Thomas Wolf, The H4 (Helpful, Honest, Harmless, Huggy) Team.

:::

Table of Links

A APPENDIX

A.1 QUALITATIVE EXAMPLES

To qualitatively compare the responses from our dSFT and dDPO models, we choose prompts from a few domains of MT-Bench, as well as some adversarial prompts to test each model’s capability to follow instructions with false premises or harmful intent. Completions for the adversarial prompts were generated with nucleus sampling(top-p = 0.95) and T = 0.7.

\ Figure 4: Model samples on a cherry-picked MT-Bench prompt to show the dDPO model’s ability to follow math instructions.

\ Figure 5: Subtle mistakes in the dSFT compared to dDPO models, where the former makes reference to an “adult-sized helicopter”. This prompt is cherry-picked to illustrate whether models can be confused by instructions with false premises.

\ Figure 6: Sample responses to prompts with harmful intent. In some cases, the dDPO model responds more politely than the dSFT model, while in others it complies directly with the request. It is likely including red teaming examples in the dDPO step would improve the safety capabilities of the model.

A.2 SFT IS A REQUIRED STEP BEFORE DPO

In Table 3 we ran an ablation to see whether SFT is necessary prior to the DPO step. We observed a significant reduction in performance in both the MT-Bench and AlpacaEval scores when the SFT step is skipped. After a qualitative evaluation of the MT-Bench generations, we observe that the pure DPO model struggles to learn the chat template:

\ Figure 7: The pure dDPO model struggles to use to apply the chat template.

:::info This paper is available on arxiv under CC 4.0 license.

:::

This content originally appeared on HackerNoon and was authored by Writings, Papers and Blogs on Text Models

Print Share Comment Cite Upload Translate Updates

APA

Writings, Papers and Blogs on Text Models | Sciencx (2024-07-03T14:00:27+00:00) Zephyr: Direct Distillation of LM Alignment: Appendix. Retrieved from https://www.scien.cx/2024/07/03/zephyr-direct-distillation-of-lm-alignment-appendix/

MLA

" » Zephyr: Direct Distillation of LM Alignment: Appendix." Writings, Papers and Blogs on Text Models | Sciencx - Wednesday July 3, 2024, https://www.scien.cx/2024/07/03/zephyr-direct-distillation-of-lm-alignment-appendix/

HARVARD

Writings, Papers and Blogs on Text Models | Sciencx Wednesday July 3, 2024 » Zephyr: Direct Distillation of LM Alignment: Appendix., viewed ,<https://www.scien.cx/2024/07/03/zephyr-direct-distillation-of-lm-alignment-appendix/>

VANCOUVER

Writings, Papers and Blogs on Text Models | Sciencx - » Zephyr: Direct Distillation of LM Alignment: Appendix. [Internet]. [Accessed ]. Available from: https://www.scien.cx/2024/07/03/zephyr-direct-distillation-of-lm-alignment-appendix/

CHICAGO

" » Zephyr: Direct Distillation of LM Alignment: Appendix." Writings, Papers and Blogs on Text Models | Sciencx - Accessed . https://www.scien.cx/2024/07/03/zephyr-direct-distillation-of-lm-alignment-appendix/

IEEE

" » Zephyr: Direct Distillation of LM Alignment: Appendix." Writings, Papers and Blogs on Text Models | Sciencx [Online]. Available: https://www.scien.cx/2024/07/03/zephyr-direct-distillation-of-lm-alignment-appendix/. [Accessed: ]

rf:citation

» Zephyr: Direct Distillation of LM Alignment: Appendix | Writings, Papers and Blogs on Text Models | Sciencx | https://www.scien.cx/2024/07/03/zephyr-direct-distillation-of-lm-alignment-appendix/ |

Please log in to upload a file.

There are no updates yet.
Click the Upload button above to add an update.

You must be logged in to translate posts. Please log in or register.

Table of Links

A APPENDIX

A.1 QUALITATIVE EXAMPLES

A.2 SFT IS A REQUIRED STEP BEFORE DPO

Related Posts