Skip to content

Activation Function

  • Determines a neuron’s output value and whether the neuron is considered “activated”.
  • Enables neural networks to learn nonlinear relationships and make predictions on unseen data.
  • Common choices (sigmoid, tanh, ReLU, softmax) have different output ranges and suit different tasks.

Activation functions are functions used in neural networks that determine the output of a node or neuron and whether that neuron should be activated, based on the input received.

Activation functions are essential components of neural networks because they decide a neuron’s output for a given input and thereby allow the network to learn from data and make predictions on unseen inputs. Different activation functions have distinct characteristics (such as output range and computational behavior) that make them more or less suitable for particular tasks.

Common activation functions described in the source:

  • Sigmoid: produces outputs between 0 and 1.
  • Tanh (hyperbolic tangent): produces outputs between -1 and 1.
  • ReLU (Rectified Linear Unit): outputs the maximum of 0 and the input.
  • Softmax: produces a probability distribution over classes.
  • Usage: binary classification tasks.
  • Outputs: between 0 and 1.
  • Numerical examples: input -10 → output 0; input 10 → output 1.
  • Formula:
f(x)=11+exf(x) = \frac{1}{1 + e^{-x}}
  • Usage: classification and regression tasks.
  • Outputs: between -1 and 1.
  • Numerical examples: input -10 → output -1; input 10 → output 1.
  • Formula:
f(x)=exexex+exf(x) = \frac{e^{x} - e^{-x}}{e^{x} + e^{-x}}
  • Usage: regression and classification tasks.
  • Behavior: outputs the maximum of 0 and the input.
  • Numerical examples: input -10 → output 0; input 10 → output 10.
  • Formula:
f(x)=max(0,x)f(x) = \max(0, x)
  • Usage: classification tasks.
  • Behavior: outputs the probability of each class in a classification task.
  • Numerical example: if there are three classes, softmax outputs the probabilities of each class.
  • Formula:
f(x)=exijexjf(x) = \frac{e^{x_i}}{\sum_{j} e^{x_j}}
  • Sigmoid: binary classification.
  • Tanh: classification and regression.
  • ReLU: regression and classification.
  • Softmax: multi-class classification (outputs class probabilities).
  • Sigmoid: useful for binary classification but not suitable for tasks with multiple classes.
  • Tanh: can suffer from vanishing gradients, which can hinder training.
  • ReLU: computationally efficient but can suffer from the dying ReLU problem (outputs become always 0).
  • Softmax: suitable for classification tasks but not for regression tasks.
  • Sigmoid
  • Tanh
  • ReLU (Rectified Linear Unit)
  • Softmax
  • Vanishing gradients
  • Dying ReLU