The Impact of Hyperparameters on Adversarial Training Performance

In the NEO-KD objective function, there are three hyperparameters (α, β, γ), where α, β control the amount of distilling knowledge from NKD, EOKD and γ increases the amount of knowledge distilled to later exits.

D.1 Hyperparameter (α, β)

The extreme value of α and β can destroy ideal adversarial training. Too large α makes strong NKD, which results in high dependency among submodels and too small α makes weak NKD, which cannot distill enough knowledge to student exits. In contrast, too large β makes strong EOKD, which can interrupt adversarial training by distilling only sparse knowledge (likelihoods of majority classes are zero) and too small β makes weak EOKD, which cannot mitigate dependency among submodels. We select α, β values in the range of [0.35, 3] and measure the adversarial test accuracy value by averaging adversarial test accuracy from all exits. The candidate (α, β) pairs are (0.35, 1), (1, 0.35), (0.35, 0.35), (0.5, 1), (1, 0.5), (0.5, 0.5), (1, 1), (2, 1), (1, 2), (2, 2), (3, 1), (1, 3), and (3, 3). When (α, β) is (3, 1), NEO-KD achieves 28.96% of adversarial test accuracy against max-average attack and 22.88% against average attack, which is the highest adversarial test accuracy among the various candidate (α, β) pairs. Therefore, we use (3, 1) as (α, β) pair in our experiments.

D.2 Hyperparameter γ

Since the prediction difference between the last exit (teacher prediction) and later exits is smaller than the prediction difference between the last exit and early exits, later exits are less effective for taking advantage of knowledge distillation. Therefore, we provide slightly larger weights to later exits for distilling more knowledge to later exits than early exits. The candidate γ values are [1, 1, 1, 1, 1, 1, 1], [1, 1, 1, 1.5, 1.5, 1.5, 1.5], and [1, 1, 1, 1.7, 1.7, 1.7, 1.7]. As a result, when we distill 1.5 times more knowledge to later exits, NEO-KD achieves 28.96% of adversarial test accuracy against max-average attack and 22.88% against average attack, which is the highest adversarial test accuracy compared to providing same weights with earlier exits to later exits (28.13% for max-average and 21.66% for average attack) or distilling 1.7 times more knowledge to later exits than earlier exits (28.68% for max-average and 22.58% for average attack). The adversarial test accuracy value is the average of adversarial test accuracies from all exits. Therefore, we use γ = [1, 1, 1, 1.5, 1.5, 1.5, 1.5] in our experiments. This result proves that the exit-balancing parameter γ with an appropriate value is needed for high performance.

:::info This paper is available on arxiv under CC 4.0 license.

:::

This content originally appeared on HackerNoon and was authored by Writings, Papers and Blogs on Text Models

Print Share Comment Cite Upload Translate Updates

APA

Writings, Papers and Blogs on Text Models | Sciencx (2024-09-30T01:22:41+00:00) The Impact of Hyperparameters on Adversarial Training Performance. Retrieved from https://www.scien.cx/2024/09/30/the-impact-of-hyperparameters-on-adversarial-training-performance/

MLA

" » The Impact of Hyperparameters on Adversarial Training Performance." Writings, Papers and Blogs on Text Models | Sciencx - Monday September 30, 2024, https://www.scien.cx/2024/09/30/the-impact-of-hyperparameters-on-adversarial-training-performance/

HARVARD

Writings, Papers and Blogs on Text Models | Sciencx Monday September 30, 2024 » The Impact of Hyperparameters on Adversarial Training Performance., viewed ,<https://www.scien.cx/2024/09/30/the-impact-of-hyperparameters-on-adversarial-training-performance/>

VANCOUVER

Writings, Papers and Blogs on Text Models | Sciencx - » The Impact of Hyperparameters on Adversarial Training Performance. [Internet]. [Accessed ]. Available from: https://www.scien.cx/2024/09/30/the-impact-of-hyperparameters-on-adversarial-training-performance/

CHICAGO

" » The Impact of Hyperparameters on Adversarial Training Performance." Writings, Papers and Blogs on Text Models | Sciencx - Accessed . https://www.scien.cx/2024/09/30/the-impact-of-hyperparameters-on-adversarial-training-performance/

IEEE

" » The Impact of Hyperparameters on Adversarial Training Performance." Writings, Papers and Blogs on Text Models | Sciencx [Online]. Available: https://www.scien.cx/2024/09/30/the-impact-of-hyperparameters-on-adversarial-training-performance/. [Accessed: ]

rf:citation

» The Impact of Hyperparameters on Adversarial Training Performance | Writings, Papers and Blogs on Text Models | Sciencx | https://www.scien.cx/2024/09/30/the-impact-of-hyperparameters-on-adversarial-training-performance/ |

Please log in to upload a file.

There are no updates yet.
Click the Upload button above to add an update.

You must be logged in to translate posts. Please log in or register.

Table of Links

D Hyperparameter Tuning

D.1 Hyperparameter (α, β)

D.2 Hyperparameter γ

Related Posts