Detailed Results of the Foundation Benchmark

Table 5 presents a detailed performance assessment of various audio-language models on the foundation benchmark. The results indicate that, except for binary-choice tasks like Speaker Gender Recognition and Synthesized Voice Detection, all other tasks require a selection from four options, establishing a baseline accuracy of 25% for random choices. Metrics close to these baselines suggest a lack of proficiency in the respective tasks.


This content originally appeared on HackerNoon and was authored by Benchmarking in Business Technology and Software

:::info Authors:

(1) Qian Yang, Zhejiang University, Equal contribution. This work was conducted during Qian Yang’s internship at Alibaba Group;

(2) Jin Xu, Alibaba Group, Equal contribution;

(3) Wenrui Liu, Zhejiang University;

(4) Yunfei Chu, Alibaba Group;

(5) Xiaohuan Zhou, Alibaba Group;

(6) Yichong Leng, Alibaba Group;

(7) Yuanjun Lv, Alibaba Group;

(8) Zhou Zhao, Alibaba Group and Corresponding to Zhou Zhao (zhaozhou@zju.edu.cn);

(9) Yichong Leng, Zhejiang University

(10) Chang Zhou, Alibaba Group and Corresponding to Chang Zhou (ericzhou.zc@alibaba-inc.com);

(11) Jingren Zhou, Alibaba Group.

:::

Abstract and 1. Introduction

2 Related Work

3 AIR-Bench and 3.1 Overview

3.2 Foundation Benchmark

3.3 Chat Benchmark

3.4 Evaluation Strategy

4 Experiments

4.1 Models

4.2 Main Results

4.3 Human Evaluation and 4.4 Ablation Study of Positional Bias

5 Conclusion and References

A Detailed Results of Foundation Benchmark

A Detailed Results of Foundation Benchmark

In Table 5, we delineate the performance assessment for each model across the various tasks on the foundation benchmark. With the exception of Speaker Gender Recognition and Synthesized Voice Detection, which are binary-choice tasks, all other tasks necessitate a selection from four options. As such, a random selection in the Speaker Gender Recognition and Synthesized Voice Detection datasets would theoretically achieve an accuracy of 50%, while the expected accuracy for random choices across the remaining datasets stands at 25%. Consequently, any performance metrics that approximate these random baselines are indicative of an absence of discernible proficiency in the respective tasks.

\ Table 5: The accuracy of each model across all tasks in the foundation benchmark.

\

:::info This paper is available on arxiv under CC BY 4.0 DEED license.

:::

\


This content originally appeared on HackerNoon and was authored by Benchmarking in Business Technology and Software


Print Share Comment Cite Upload Translate Updates
APA

Benchmarking in Business Technology and Software | Sciencx (2024-10-16T15:13:54+00:00) Detailed Results of the Foundation Benchmark. Retrieved from https://www.scien.cx/2024/10/16/detailed-results-of-the-foundation-benchmark/

MLA
" » Detailed Results of the Foundation Benchmark." Benchmarking in Business Technology and Software | Sciencx - Wednesday October 16, 2024, https://www.scien.cx/2024/10/16/detailed-results-of-the-foundation-benchmark/
HARVARD
Benchmarking in Business Technology and Software | Sciencx Wednesday October 16, 2024 » Detailed Results of the Foundation Benchmark., viewed ,<https://www.scien.cx/2024/10/16/detailed-results-of-the-foundation-benchmark/>
VANCOUVER
Benchmarking in Business Technology and Software | Sciencx - » Detailed Results of the Foundation Benchmark. [Internet]. [Accessed ]. Available from: https://www.scien.cx/2024/10/16/detailed-results-of-the-foundation-benchmark/
CHICAGO
" » Detailed Results of the Foundation Benchmark." Benchmarking in Business Technology and Software | Sciencx - Accessed . https://www.scien.cx/2024/10/16/detailed-results-of-the-foundation-benchmark/
IEEE
" » Detailed Results of the Foundation Benchmark." Benchmarking in Business Technology and Software | Sciencx [Online]. Available: https://www.scien.cx/2024/10/16/detailed-results-of-the-foundation-benchmark/. [Accessed: ]
rf:citation
» Detailed Results of the Foundation Benchmark | Benchmarking in Business Technology and Software | Sciencx | https://www.scien.cx/2024/10/16/detailed-results-of-the-foundation-benchmark/ |

Please log in to upload a file.




There are no updates yet.
Click the Upload button above to add an update.

You must be logged in to translate posts. Please log in or register.