Softmax

Softmax definition

The softmax function is a mathematical operation that transforms a vector of arbitrary real numbers into a probability distribution — that is, a set of non-negative numbers that sum to 1.

Given a vector , the softmax of the i-th element is defined as:

Here:

  • exponentiates each element, amplifying larger ones,
  • the denominator normalizes the vector so that all components sum to 1.

Thus, each and

Characteristics and Intuition

Probability Interpretation Softmax outputs can be seen as the model’s confidence for each class. For example, in a 3-class classifier, if

The model believes the input belongs to the first class with 80% probability.

Amplifies Differences Exponentiation makes large input values much larger and small ones much smaller. Hence, even small differences in logits (the pre-softmax scores) become decisive — the “winner” logit dominates, while others shrink toward zero.

Scale Invariance Adding the same constant to all ( z_i ) does not change the output: .

This property is often used for numerical stability (by subtracting the max logit before computing softmax).

Smooth and Differentiable Softmax is continuous and differentiable — perfect for gradient-based optimization in neural networks.

Used in Classification Tasks It’s the standard final layer in multi-class classification. The model outputs raw logits → softmax converts them into probabilities → cross-entropy loss compares them to the true class.

Example

Suppose a model gives logits . Computing the softmax for each one we obtain: , .

Notice that is the one with the highest probability, so the model would predict class 1.

In Summary

PropertyMeaning
InputVector of logits (real numbers)
OutputProbability distribution
Range(0,1), sums to 1
PurposeConverts scores into interpretable probabilities
Common UseOutput layer in multi-class classification (with cross-entropy loss)