Classification Basics
The Softmax Function and its Derivative
We define n as the softmax output vector index and k as the softmax input vector index. The sum symbol in the formulas below always means “sum over all k” (sum over all input indices).
The softmax function is defined as
\[f(x)[n] = y[n] = \frac{e^{x[n]}}{\sum_{}^{}e^{x[k]}}\]The derivative of softmax(x)[n] with respect to x[k] has to be divided into two cases
The case in which n equals k
\[f'(x)[n] = y'[n] = \frac{ e^{x[n]} { \sum_{}^{} e^{x[k]} } - e^{x[n]} e^{x[k]}}{ (\sum_{}^{} e^{x[k]})^2 }\] \[= \frac{ e^{x[n]} }{ \sum_{}^{} e^{x[k]} } - \frac{ e^{x[n]} }{ \sum_{}^{} e^{x[k]} } \frac{ e^{x[k]} }{ \sum_{}^{} e^{x[k]} }\] \[= \frac{ e^{x[n]} }{ \sum_{}^{} e^{x[k]} } * 1 - \frac{ e^{x[n]} }{ \sum_{}^{} e^{x[k]} } \frac{ e^{x[k]} }{ \sum_{}^{} e^{x[k]} }\] \[= \frac{ e^{x[n]} }{ \sum_{}^{} e^{x[k]} } (1 - \frac{ e^{x[n]} }{ \sum_{}^{} e^{x[k]} })\] \[= y[n] (1 - y[k])\]The case in which n and k are not equal
\[f'(x)[n] = y'[n] = \frac{ 0 * { \sum_{}^{} e^{x[k]} } - e^{x[n]} e^{x[k]}}{ (\sum_{}^{} e^{x[k]})^2 }\] \[= 0 - \frac{ e^{x[n]} }{ \sum_{}^{} e^{x[k]} } \frac{ e^{x[k]} }{ \sum_{}^{} e^{x[k]} }\] \[= \frac{ e^{x[n]} }{ \sum_{}^{} e^{x[k]} } * 0 - \frac{ e^{x[n]} }{ \sum_{}^{} e^{x[k]} } \frac{ e^{x[k]} }{ \sum_{}^{} e^{x[k]} }\] \[= \frac{ e^{x[n]} }{ \sum_{}^{} e^{x[k]} } (0 - \frac{ e^{x[n]} }{ \sum_{}^{} e^{x[k]} })\] \[= y[n] (0 - y[k])\]XOR Classification
import numpy_neural_network as npnn
import npnn_datasets
model = npnn.Sequential()
model.layers = [
npnn.Dense(2, 4),
npnn.LeakyReLU(4),
npnn.Dense(4, 4),
npnn.LeakyReLU(4),
npnn.Dense(4, 1),
npnn.Sigmoid(1)
]
loss_layer = npnn.loss_layer.BinaryCrossEntropyLoss(1)
optimizer = npnn.optimizer.Adam(alpha=5e-3)
dataset = npnn_datasets.XORBinClasses()
optimizer.norm = dataset.norm
optimizer.model = model
optimizer.model.chain = loss_layer
XOR Classification using Sigmoid + Binary-Cross-Entropy-Loss
import numpy_neural_network as npnn
import npnn_datasets
model = npnn.Sequential()
model.layers = [
npnn.Dense(2, 4),
npnn.LeakyReLU(4),
npnn.Dense(4, 4),
npnn.LeakyReLU(4),
npnn.Dense(4, 2),
npnn.Softmax(2)
]
loss_layer = npnn.loss_layer.CrossEntropyLoss(2)
optimizer = npnn.optimizer.Adam(alpha=1e-2)
dataset = npnn_datasets.XORTwoClasses()
optimizer.norm = dataset.norm
optimizer.model = model
optimizer.model.chain = loss_layer
XOR Classification using Softmax + Cross-Entropy-Loss