Classification Basics

The Softmax Function and its Derivative

We define n as the softmax output vector index and k as the softmax input vector index. The sum symbol in the formulas below always means “sum over all k” (sum over all input indices).

The softmax function is defined as

\[f(x)[n] = y[n] = \frac{e^{x[n]}}{\sum_{}^{}e^{x[k]}}\]

The derivative of softmax(x)[n] with respect to x[k] has to be divided into two cases

The case in which n equals k

\[f'(x)[n] = y'[n] = \frac{ e^{x[n]} { \sum_{}^{} e^{x[k]} } - e^{x[n]} e^{x[k]}}{ (\sum_{}^{} e^{x[k]})^2 }\] \[= \frac{ e^{x[n]} }{ \sum_{}^{} e^{x[k]} } - \frac{ e^{x[n]} }{ \sum_{}^{} e^{x[k]} } \frac{ e^{x[k]} }{ \sum_{}^{} e^{x[k]} }\] \[= \frac{ e^{x[n]} }{ \sum_{}^{} e^{x[k]} } * 1 - \frac{ e^{x[n]} }{ \sum_{}^{} e^{x[k]} } \frac{ e^{x[k]} }{ \sum_{}^{} e^{x[k]} }\] \[= \frac{ e^{x[n]} }{ \sum_{}^{} e^{x[k]} } (1 - \frac{ e^{x[n]} }{ \sum_{}^{} e^{x[k]} })\] \[= y[n] (1 - y[k])\]

The case in which n and k are not equal

\[f'(x)[n] = y'[n] = \frac{ 0 * { \sum_{}^{} e^{x[k]} } - e^{x[n]} e^{x[k]}}{ (\sum_{}^{} e^{x[k]})^2 }\] \[= 0 - \frac{ e^{x[n]} }{ \sum_{}^{} e^{x[k]} } \frac{ e^{x[k]} }{ \sum_{}^{} e^{x[k]} }\] \[= \frac{ e^{x[n]} }{ \sum_{}^{} e^{x[k]} } * 0 - \frac{ e^{x[n]} }{ \sum_{}^{} e^{x[k]} } \frac{ e^{x[k]} }{ \sum_{}^{} e^{x[k]} }\] \[= \frac{ e^{x[n]} }{ \sum_{}^{} e^{x[k]} } (0 - \frac{ e^{x[n]} }{ \sum_{}^{} e^{x[k]} })\] \[= y[n] (0 - y[k])\]

XOR Classification

import numpy_neural_network as npnn
import npnn_datasets

model = npnn.Sequential()
model.layers = [
  npnn.Dense(2, 4),
  npnn.LeakyReLU(4),
  npnn.Dense(4, 4),
  npnn.LeakyReLU(4),
  npnn.Dense(4, 1),
  npnn.Sigmoid(1)
]

loss_layer = npnn.loss_layer.BinaryCrossEntropyLoss(1)
optimizer  = npnn.optimizer.Adam(alpha=5e-3)
dataset    = npnn_datasets.XORBinClasses()

optimizer.norm  = dataset.norm
optimizer.model = model
optimizer.model.chain = loss_layer

XOR Classification using Sigmoid + Binary-Cross-Entropy-Loss

import numpy_neural_network as npnn
import npnn_datasets

model = npnn.Sequential()
model.layers = [
  npnn.Dense(2, 4),
  npnn.LeakyReLU(4),
  npnn.Dense(4, 4),
  npnn.LeakyReLU(4),
  npnn.Dense(4, 2),
  npnn.Softmax(2)
]

loss_layer = npnn.loss_layer.CrossEntropyLoss(2)
optimizer  = npnn.optimizer.Adam(alpha=1e-2)
dataset    = npnn_datasets.XORTwoClasses()

optimizer.norm  = dataset.norm
optimizer.model = model
optimizer.model.chain = loss_layer

XOR Classification using Softmax + Cross-Entropy-Loss