Visualization of a simple neural network for educational purposes.

Network size

Learning settings

Train

0

Training set input

Training set

Hint: Click on an entry in the table to activate an input

Weights

Cost:

What is this?

This is implementation of neural network with back-propagation. There aren't any special
tricks, it's as
simple neural
network as it gets.

Cost function

The cost is defined as \(C = \frac{1}{2 \times
sampleCnt}\sum^{sampleCnt}_{m=1}(\sum^{outputSize}_{n=1}(neruon_n-target_n)^2)\).
In words: Error is defined as \((value - target)^2\). To get error of neural network for one
training sample,
you
simply add errors of all output neurons. The total cost is then defined as average error of
all training
samples.

Forward propagation

Let's say that the value of connection is the connection's weight (how wide it is) times the
first connected
neuron. To calculate
the value of some neuron you add the values of all incoming connections and apply the
sigmoid function to
that sum. Other
activation functions are possible, but I have not implemented them yet.

Back propagation

The cost function defined above is a function dependend on weights of connections in the same
way as \(f(x, y)
= x^2
+ y^2\) is dependend on x and y. In the beginning, the weights are random. Let's say x = 5 and
y = 3. The cost
at
this point would be 25 + 9 = 34, which we want to get to 0. Now we take the derivate with
respect to each of
these
weights, which tells us how to adjust the wieghts to minimize the function. \(\frac{\partial
f(x, y)}{\partial
x}
= 2x\), \(\frac{\partial f(x, y)}{\partial y} = 2y\). Now that we have the derivatives, we
know the "direction"
in
which to change the weights. \(x_{new} = x_{old} - rate \times 2x = 5 - 0.1 \times 2 \times 5
= 4\) and that's
a
little bit closer to the desired 0 result of f(x, y). The rate is necessary to avoid stepping
over the minimum.

In practice is the computation of the derivatives is a little bit harder, but all you need to
know is the
chain rule. I highly recommend
3blue1brown's series
and
this
paper for better understanding.