In my post last post, I have covered the key points to understand neural network based on what I learnt in the Neural network and deep learning Coursera course taught by Andrew Ng
In this post, the aim is to go deeper and understand the various internal components of a neural network. This post is written considering Neural network usage for Supervised learning.
The aim of the post is to
- Help non technical folks understand neural networks
- Explain the internal components of neural networks in simple language
A quick recap of the basics of Neural networks
Neural network is a technique used in Machine learning. In neural networks, multiple layers (stacked together) are used to process the input and understand what the input means. The presence of multiple layers helps in learning the underlying pattern in the training dataset better.
The layers are nothing but mathematical formulae of different types. And hence, a trained neural network is nothing but a neural network which has finalised values for various variables in the used in the mathematical formulae in the neural network. (To understand the basics of Neural network, do checkout my last post.)
With that summary in mind, lets proceed ahead.
Before we jump into the details of the neural networks, lets define the goal behind developing a Neural network.
The goal of developing a Neural network
Is to have an algorithm which can process the training set iteratively and understand the pattern from the training set. The pattern thus learnt defines the expected output of the training set.
This neural network can then be saved and used in further tasks like Prediction.
E.g. To develop a neural network to identify a cat as a cat from Cat pictures, we will say train the neural network on 1000 pictures of Cats. During the training, the neural network will try to understand the pattern and conclude the values for the various algorithm variables. If this trained model is given a new picture of Cat, it will give an output of 1 (which means the input picture is of cat)
Lets proceed to the next step i.e to understand the internals of a Neural network.
What are the components of a neural network
The following are the internal components of a Neural network
- Layers
- Size of the layer
- Training dataset
- Layer variables/Algorithm variables/Neural network variables
- Learning process
- Activation function
- Forward propagation
- Cost function
- Gradient descent
- Backward propagation
- Update Algorithm variables
- Static variables
- Number of iterations
- Learning rate
- What is a layer in a neural network
- A neural network consists of more than 1 layers
- Each neural network layer is a collection of one or more neurons.
- A neuron can be defined as
Neuron = Processing formula (e.g Regression) + Neuron activation function
- The layers between the input and output are called as Hidden layers.
- In fact, in general the input and output layers are not considered as layers.
- Keep reading to know about Activation function
- Size of the layer – Number of neuron in the layer
A neural network layer might have more than one neuron in a given layer. The number of units in a layer is known as the size of the layer. The size of the layer is dependent on the complexity of the input data.
- Input to a Neural network – the Training Dataset
The input to a neural network is a dataset called as Training set. The training set has the input data and also the expected output or meaning of the data.
For e.g the training set pertaining to say images of Cats will have the RGB profile of each frame as rows and also the label that this data is of Cat. A sample dataset is given below
Pixel profile | Cat |
[[1,0,0],[0,1,1]] | 1 |
[[1,0,0],[1,1,1]] | 0 |
- Algorithm variables
The algorithm variables are the variables used in the mathematical formula of each neuron in the layers.
For e.g. if the neuron is using linear regression which has the formula
Z=W.X(transpose)+b
Then W and b are the layer variables. X is the input to the layer and Z is the output of the linear regression action.
And Z is then passed as an input to the activation function of the layer. The output of the activation layer is the input to the next layer.
W and b are initialized randomly for the first run, for all the layers.
Before the next iteration, the values of W and b are updated based on the calculation of the Gradient Descent. Keep reading to understand what is Gradient Descent.
- How a neural network learns
- Iteratively process the training dataset
- Compare the neural network output to the expected output as per the training dataset.
- Find how far the neural network output is from the desired output
- And fine tune the neural network variables to minimize the gap.
- What is an activation function
The output of a given layer depends upon the activation function used in the layer.
An activation function is a mathematical formula and it acts as a gate.
The purpose of this gate is to redirect the algorithm towards the desired output. Based on the input, the activation function calculates a value and is passed on to the next layer.
Some of the common activation functions are Sigmoid, Relu, tanh
- What is forward propagation
In the forward propagation, each layer of the neural network processes the training set and gives out an output. The output is compared to the actual desired output as per the training dataset. This is called as calculating Cost.
- What is backward propagation
And in the backwards propagation, the learning process is executed in the backward direction.
In backward propagation, the difference between the input to a given layer and the output of the given layer is calculated.
We say it is backward propagation as we are comparing the algorithm output to the input to the last layer and continuing this calculation till we reach the first layer.
By doing this we have found the relation between the output and input of each layer and is stored.
For calculation, we use the Derivatives concept of calculus. You get to know the maths in details in the course.
- What is a Cost function
The difference between the algorithm output and the desired output (as per the training dataset) is loss function for a given training example.
When the loss function is summed up for the entire training set, it is called as the called Cost function.
In simple words, the cost function tell you how far is the output of the neural network from the desired output for a given iteration considering all training example.
You want the cost to be minimal at the end of training of the model.
- What is gradient descent
Gradient descent is the name of the approach taken to update the algorithm variables.
After completing the backward propagation, we know the different between the expect output and the output of the algorithm for each layer.
Using this information, we want to change the algorithm variables so that the difference is reduced. This approach of changing the algorithm variables values using data from backward propagation is known as Gradient descent.
Graphically, it is like moving down from top position (which is high gap between algorithm output and the actual output) to the ground position (the algorithm output is equal to the expected output).
How fast or slow do we roll down this slope is controlled by a static variable called as Learning rate.
- Update Algorithm variables
The output of the backward propagation layer is multiplied by the learning rate to get the new algorithm variable values to be used in the next iteration.
The formula is –
New variable value for
the given layer = old variable value for the given layer - learning rate * variable value from the
backward propagation for the given layer
- What are the static variables that control the neural network
- The first static variable is the Learning rate. It is defined by the Data Scientist. Learning rate is used in gradient descent to decide by what size, should the values of the algorithm variables ( in this post, W and b) be updated. If the learning rate is too high, the neural network might not learn enough. And if the learning rate is low, the neural network algorithm might never complete learning process. The learning rate decides at the jump between or the difference between the values of variable in 2 consecutive iterations.
- In a single iteration of Forward propagation and Backward propagation, the neural network might not be trained well enough. Hence. these 2 steps are repeated multiple times. This is called as Number of Iterations and is another static variable set by the Data Scientist.
Training the neural network is all about updating these variables iteratively such that the output of the trained model is satisfactory (the gap between expected out an the algorithm output is minimal)
Putting it all together – The flow of the Neural Network training
In each iteration,
- Initialize the algorithm variables for all the layers randomly if it is the first iteration
- In the forward propagation
- The neural network will process the training set
- Each layer will process the training set and output is passed on to the next layer based on its activation function.
- The loss for each layer is computed and stored in memory (cache).
- Calculate the cost function as the sum of loss function for all the layers. This is required to know if the algorithm is learning anything.
- Start the backward propagation to know the relation between output and input of each layer. The processing starts from the last layer and goes till the first layer.
- Start the gradient descent – update the algorithm variables using the learning rate and the output of the backward propagation step. We call it gradient descent as we use a derivative of the output of the backward propagation step to calculate the new values for the algorithm variables.
- Repeat the iteration with the new algorithm values. After each iteration, if the algorithm is learning the cost function value should keep on decreasing.
- After exhausting the number of iteration, check accuracy
- This is calculated by comparing the output of the algorithm to the actual output for a given number of test examples.
At the end of the iterations, we expect to have a model which is able to predict the output of a test data satisfactorily. This is called as Trained model.
The trained model can saved and can be used for prediction in next jobs.
Hope reading this post has helped you in understanding the internal working of neural networks. Do share your feedback in the comments…
Happy learning!!!