Textbooks are All You Need: Data Pruning for Unbiased Performance Evaluation

In this study, researchers from Microsoft introduce phi-1, a new large language model for code, with significantly smaller size than competing models.


This content originally appeared on HackerNoon and was authored by Knapsack

:::info Authors:

(1) Suriya Gunasekar, Microsoft Research;

(2) Yi Zhang, Microsoft Research;

(3) Jyoti Aneja, Microsoft Research;

(4) Caio C´esar Teodoro Mendes, Microsoft Research;

(5) Allie Del Giorno, Microsoft Research;

(6) Sivakanth Gopi, Microsoft Research;

(7) Mojan Javaheripi, Microsoft Research;

(8) Piero Kauffmann, Microsoft Research;

(9) Gustavo de Rosa, Microsoft Research;

(10) Olli Saarikivi, Microsoft Research;

(11) Adil Salim, Microsoft Research;

(12) Shital Shah, Microsoft Research;

(13) Harkirat Singh Behl, Microsoft Research;

(14) Xin Wang, Microsoft Research;

(15) S´ebastien Bubeck, Microsoft Research;

(16) Ronen Eldan, Microsoft Research;

(17) Adam Tauman Kalai, Microsoft Research;

(18) Yin Tat Lee, Microsoft Research;

(19) Yuanzhi Li, Microsoft Research.

:::

5 Data pruning for unbiased performance evaluation

In Figure 2.1, we see that training on CodeExercises leads to a substantial boost in the performance of the model on the HumanEval benchmark. To investigate this boost, we propose to prune the CodeExercises dataset by removing files that are “similar” to those in HumanEval. This process can be viewed as a “strong form” of data decontamination. We then retrain our model on such pruned data, and still observe strong performance on HumanEval. In particular, even after aggressively pruning more than 40% of the CodeExercises dataset (this even prunes files that are only vaguely similar to HumanEval, see Appendix C), the retrained phi-1 still outperforms StarCoder.

\ We believe that such data pruning experiment is a fair way to evaluate performance, and is more insightful than standard “contamination” studies in the literature that are usually based on measures of overlap between training and test data (e.g., Section 4.8 of [AON+ 21]). For sake of completeness we start this section by conducting a standard contamination experiment, which shows that CodeExercises is not contaminated by HumanEval in this standard sense.

\

:::info This paper is available on arxiv under CC BY 4.0 DEED license.

:::


[1] Developing rigorous sets of tests can be a significant undertaking, as demonstrated by [LXWZ23].


This content originally appeared on HackerNoon and was authored by Knapsack


Print Share Comment Cite Upload Translate Updates
APA

Knapsack | Sciencx (2024-09-12T11:45:18+00:00) Textbooks are All You Need: Data Pruning for Unbiased Performance Evaluation. Retrieved from https://www.scien.cx/2024/09/12/textbooks-are-all-you-need-data-pruning-for-unbiased-performance-evaluation/

MLA
" » Textbooks are All You Need: Data Pruning for Unbiased Performance Evaluation." Knapsack | Sciencx - Thursday September 12, 2024, https://www.scien.cx/2024/09/12/textbooks-are-all-you-need-data-pruning-for-unbiased-performance-evaluation/
HARVARD
Knapsack | Sciencx Thursday September 12, 2024 » Textbooks are All You Need: Data Pruning for Unbiased Performance Evaluation., viewed ,<https://www.scien.cx/2024/09/12/textbooks-are-all-you-need-data-pruning-for-unbiased-performance-evaluation/>
VANCOUVER
Knapsack | Sciencx - » Textbooks are All You Need: Data Pruning for Unbiased Performance Evaluation. [Internet]. [Accessed ]. Available from: https://www.scien.cx/2024/09/12/textbooks-are-all-you-need-data-pruning-for-unbiased-performance-evaluation/
CHICAGO
" » Textbooks are All You Need: Data Pruning for Unbiased Performance Evaluation." Knapsack | Sciencx - Accessed . https://www.scien.cx/2024/09/12/textbooks-are-all-you-need-data-pruning-for-unbiased-performance-evaluation/
IEEE
" » Textbooks are All You Need: Data Pruning for Unbiased Performance Evaluation." Knapsack | Sciencx [Online]. Available: https://www.scien.cx/2024/09/12/textbooks-are-all-you-need-data-pruning-for-unbiased-performance-evaluation/. [Accessed: ]
rf:citation
» Textbooks are All You Need: Data Pruning for Unbiased Performance Evaluation | Knapsack | Sciencx | https://www.scien.cx/2024/09/12/textbooks-are-all-you-need-data-pruning-for-unbiased-performance-evaluation/ |

Please log in to upload a file.




There are no updates yet.
Click the Upload button above to add an update.

You must be logged in to translate posts. Please log in or register.