Activation functions in PyTorch (4)

Buy Me a Coffee☕

*Memos:

My post explains GELU() and Mish().

My post explains SiLU() and Softplus().

My post explains Step function, Identity and ReLU.

My post explains Leaky ReLU, PReLU and FReLU.

My post explains ELU, SELU and CELU.

My post …


This content originally appeared on DEV Community and was authored by Super Kai (Kazuya Ito)

Buy Me a Coffee

*Memos:

(1) GELU(Gaussian Error Linear Unit):

  • can convert an input value(x) to an output value by the input value's probability under a Gaussian distribution with optional Tanh. *0 is exclusive except when x = 0.
  • 's formula is. *Both of them get the almost same results: Image description Or: Image description
  • is GELU() in PyTorch.
  • is used in:
    • Transformer. *Transformer() in PyTorch.
    • NLP(Natural Language Processing) based on Transformer such as ChatGPT, BERT(Bidirectional Encoder Representations from Transformers), etc. *Strictly speaking, ChatGPT and BERT are based on Large Language Model(LLM) which is based on Transformer.
  • 's pros:
    • It mitigates Vanishing Gradient Problem.
    • It mitigates Dying ReLU Problem. *0 is still produced for the input value 0 so Dying ReLU Problem is not completely avoided.
  • 's cons:
    • It's computationally expensive because of complex operation including Erf(Error function) or Tanh.
  • 's graph in Desmos:

Image description

(2) Mish:

  • can convert an input value(x) to an output value by x * Tanh(Softplus(x)). *0 is exclusive except when x = 0.
  • 's formula is: Image description
  • is Mish() in PyTorch.
  • 's pros:
    • It mitigates Vanishing Gradient Problem.
    • It mitigates Dying ReLU Problem. *0 is still produced for the input value 0 so Dying ReLU Problem is not completely avoided.
  • 's cons:
    • It's computationally expensive because of Tanh and Softplus operation.
  • 's graph in Desmos:

Image description

(3) SiLU(Sigmoid-Weighted Linear Units):

  • can convert an input value(x) to an output value by x * Sigmoid(x). *0 is exclusive except when x = 0.
  • 's formula is y = x / (1 + e-x).
  • is also called Swish.
  • is SiLU() in PyTorch.
  • 's pros:
    • It mitigates Vanishing Gradient Problem.
    • It mitigates Dying ReLU Problem. *0 is still produced for the input value 0 so Dying ReLU Problem is not completely avoided.
  • 's cons:
    • It's computationally expensive because of Sigmoid.
  • 's graph in Desmos:

Image description

(4) Softplus:

  • can convert an input value(x) to the output value between 0 and ∞. *0 is exclusive.
  • 's formula is y = log(1+ex).
  • is Softplus() in PyTorch.
  • 's pros:
    • It normalizes input values.
    • The convergence is stable.
    • It mitigates Vanishing Gradient Problem.
    • It mitigates Exploding Gradient Problem.
    • It avoids Dying ReLU Problem.
  • 's cons:
    • It's computationally expensive because of log and exponential operation.
  • 's graph in Desmos:

Image description


This content originally appeared on DEV Community and was authored by Super Kai (Kazuya Ito)


Print Share Comment Cite Upload Translate Updates
APA

Super Kai (Kazuya Ito) | Sciencx (2024-10-05T19:02:44+00:00) Activation functions in PyTorch (4). Retrieved from https://www.scien.cx/2024/10/05/activation-functions-in-pytorch-4/

MLA
" » Activation functions in PyTorch (4)." Super Kai (Kazuya Ito) | Sciencx - Saturday October 5, 2024, https://www.scien.cx/2024/10/05/activation-functions-in-pytorch-4/
HARVARD
Super Kai (Kazuya Ito) | Sciencx Saturday October 5, 2024 » Activation functions in PyTorch (4)., viewed ,<https://www.scien.cx/2024/10/05/activation-functions-in-pytorch-4/>
VANCOUVER
Super Kai (Kazuya Ito) | Sciencx - » Activation functions in PyTorch (4). [Internet]. [Accessed ]. Available from: https://www.scien.cx/2024/10/05/activation-functions-in-pytorch-4/
CHICAGO
" » Activation functions in PyTorch (4)." Super Kai (Kazuya Ito) | Sciencx - Accessed . https://www.scien.cx/2024/10/05/activation-functions-in-pytorch-4/
IEEE
" » Activation functions in PyTorch (4)." Super Kai (Kazuya Ito) | Sciencx [Online]. Available: https://www.scien.cx/2024/10/05/activation-functions-in-pytorch-4/. [Accessed: ]
rf:citation
» Activation functions in PyTorch (4) | Super Kai (Kazuya Ito) | Sciencx | https://www.scien.cx/2024/10/05/activation-functions-in-pytorch-4/ |

Please log in to upload a file.




There are no updates yet.
Click the Upload button above to add an update.

You must be logged in to translate posts. Please log in or register.