Neural networks are pretty old algorithms for machine learning, developed originally with the idea to mimic human brain. For sure, human brain has the best learning mechanisms we know. If it is possible to mimic a human brain, that will be a truly intelligent machine. Though we are not there yet, but still neural network is very efficient in machine learning. It was popular in 1980s and 1990s. Recently it has become popular in machine learning and deep learning again. Probably because recent computers are efficient enough to run a large neural network algorithm.
Neural Network Basics
Time to dive into the real neural network algorithm. Let’s start with some basic ideas and notations. This is how a basic neuron looks like.
X0 is the bias unit or bias neuron. The value of X0 is always 1. X1, X2 and X3 are the input units. From this input units, we calculate the hidden layer and from hidden layer, we calculate the final output.
Like logistic regression algorithm, X needs to me parameterized by theta. So, for each X there need to be a corresponding theta. X and theta both are same sized vector. This is the basic ideas, notations for a single neuron. Now look at the computation process for a neural network.
When several neurons come together to make a complex model, that can be called a neural network. A neural network should have at least three layers. Here is a diagram:
In this example Layer 1 is input layer, Layer 2 is a hidden layer and Layer 3 in an output layer. Basically, anything that is not an input or output layer is a hidden layer. There might be more than one hidden layer.
There are computations involved in transitioning from one layer to the next. In the diagram above, there is one more new notation ‘a’ and also we talked about theta.
In the neural network diagram, all the a’s have subscript and superscript. Superscripts denote the layer. In the picture, a’s have the superscript 2, that means they are in the Layer2 and subscripts denote the unit. In the computation of ‘a’s, it takes inputs parameterized by thetas. As we can see in the diagram each ‘a’ is connected with all the inputs. So, when we calculate an ‘a’, it sums up all the inputs multiplied by its corresponding theta. This sum then passes through the activation function to get a. Here logistic function or sigmoid function is the activation function.
To simplify above mentioned equations, replace theta times X part with z, the way we did in the logistic regression.
Here theta1 is a 3×4 dimensional matrix. 3 comes from 3 units in layer2 and 4 comes from 3 units in layer1 plus one. We add one in layer 1 for the bias term X0. The value of the bias term X0 is always 1.
This way for each ‘a’ we can get a ‘z’ value and a vector z2 is constructed. Here ‘a’ is a 3 dimensional vector and ‘z’ is also a 3 dimensional vector. Thus activation function will be applied to each of the element of ‘z2’ vector.
As we have all the values of Layer 2 of the neural network diagram, final output can be computed using the same method. It’s lot simpler. Because now it is a simple neuron. Just take the value of ‘a’s parameterized by theta, sum them up and pass through the activation function. That will be our final output.
In this equation, we have an ‘a0’ term, because we are adding a bias term in Layer2 as usual whose value is 1. Thus a2 vector becomes 4 dimensional. Simplifying all the terms we get,
This process of computing output from input to hidden layer to output is called forward propagation.
I am finishing this article here. But it will continue. Some more conceptual discussion and real code examples are coming soon.
I am grateful for the detail ideas about Neural Network to this Coursera course: https://www.coursera.org/learn/machine-learning?