Softmax
Softmax definition
The softmax function is a mathematical operation that transforms a vector of arbitrary real numbers into a probability distribution — that is, a set of non-negative numbers that sum to 1.
Given a vector
Here:
exponentiates each element, amplifying larger ones, - the denominator normalizes the vector so that all components sum to 1.
Thus, each
Characteristics and Intuition
Probability Interpretation Softmax outputs can be seen as the model’s confidence for each class. For example, in a 3-class classifier, if
The model believes the input belongs to the first class with 80% probability.
Amplifies Differences Exponentiation makes large input values much larger and small ones much smaller. Hence, even small differences in logits (the pre-softmax scores) become decisive — the “winner” logit dominates, while others shrink toward zero.
Scale Invariance
Adding the same constant to all ( z_i ) does not change the output:
This property is often used for numerical stability (by subtracting the max logit before computing softmax).
Smooth and Differentiable Softmax is continuous and differentiable — perfect for gradient-based optimization in neural networks.
Used in Classification Tasks It’s the standard final layer in multi-class classification. The model outputs raw logits → softmax converts them into probabilities → cross-entropy loss compares them to the true class.
Example
Suppose a model gives logits
Notice that
In Summary
| Property | Meaning |
|---|---|
| Input | Vector of logits (real numbers) |
| Output | Probability distribution |
| Range | (0,1), sums to 1 |
| Purpose | Converts scores into interpretable probabilities |
| Common Use | Output layer in multi-class classification (with cross-entropy loss) |