Softmax is defined as:
$
\sigma(\vec{z})_i = \frac{e^{z_i}}{\sum_{j=1}^{K}e^{z_j}}
$
where $z_i$ are the elements of the input vector, and $e$ is the Euler number of the exponential function. The denominator is the normalization term which ensures that all the output values of the function will sum up to $1$, thus constituting a valid probability distribution. Softmax normalizes the weights, but it makes large values larger via the exponential function, as the following figure[^3] illustrates:

[^3]: [Attention Approximates Sparse Distributed Memory](https://www.youtube.com/watch?v=THIIk7LR9_8)