Introduction to Neural Network (NN) and How it works

Nature has inspired many inventions. As birds inspired us to fly and airplane was invented. Similarly, the inspiration to build an intelligent machine came from the brain's architecture and its working mechanism. We can say that Artificial Neural Network (ANN) was inspired from architecture of Biological Neural Network (BNN).
Note: In Machine Learning, Neural Networks (NN) generally refers to ANN.

Neural Networks are at the very core of Deep Learning. They are powerful, scalable and capable of tackling with large and complex ML tasks such as: speech recognition, natural language translation, etc. Services powered by neural nets include: Windows Cortana, DeepMind's AlphaGo, and many more.

ANNs were first introduced back in 1943 by the neurophysiologist Warren McCulloch and the mathematician Walter Pitts. Since then, there were some successes for ANN and new Machine Learning Techniques were invented. There were no any major breakthroughs until now. The tremendous increase in computing power and huge quantity of data available to train neural networks have helped to leverage the power of neural networks.

Perceptron

Perceptron is analogous to biological neuron and it is commonly known as an artificial neuron. It is the basic building block of neural networks. Connection of artificial neurons that simulate the information processing paradigm of biological nervous system is called an Artificial Neural Network.
biological neuron
A Biological Neuron

Here's a little intuition about how our biological neuron works. Neuron receives signals via dendrites. The soma or cell body processes the information and neuron fires if it reaches the threshold value. The information is carried down to axon and transmitted to another neuron via synapses. The amount of signal transferred through axon defines the strength of connection between neurons.

Perceptron is a single layer neural network and is one of the simplest ANN architectures. In the figure below, x1, x2, x3 are the inputs and w1, w2, w3 are the corresponding weights associated with inputs. z is the weighted sum of the inputs (i.e. z = w1 x1 + w2 x2 + w3 x3 + b ). b is the bias which is like a threshold value whether to activate neuron or not. f(z) is the activation function that computes the output y. The output y is passed to another neuron as an input in neural network.
Perceptron (An Artificial Neuron)

The learning process for perceptron is Hebbian Learning. It states that: the weight between two neurons increases if the two neurons activate simultaneously and reduces if they activate separately. The perceptron is fed one training instance at a time, and for each instance it makes its predictions. If it outputs wrong prediction, then the perceptron reinforces the input weights which have contributed to the correct prediction.

Multi Layer Perceptron (MLP)

A Multi Layer Perceptron is formed by stacking perceptrons and commonly known as Neural Network. A single neuron is a comutational unit. Neural Network introduces non-linearity to the model.

Neural Network consists of one input layer, one or more hidden layers and one output layer. Each layer except the output layer includes a bias neuron. When a neural network has two or more hidden layers, it is called a Deep Neural Network (DNN).
Neural Network

Each hidden layer can have arbitrary number of neurons. The number of neurons in input and output layers depends upon your input data and desired output. A neural network generates prediction after passing all the inputs through all the layers, up to the output layer. This process is called feedforward (forward pass). Backpropagation is used to train a neural network.

Among the vast majority of neural networks, the three widely used Neural Networks are:
  • Convolutional Neural Network
  • Recurrent Neural Network
  • Autoencoders

Backpropagation

Backpropagation (backward propagation of errors) is the most common training algorithm for neural networks. It is often called Gradient Descent. Generally, training a neural network is to learn the best weights for every links in the network that contribute to the correct prediction and reduce the cost function.
Cost function (error function or average loss function) measures the performance of neural network model and finds the loss by comparing actual output with predicted output. There are different kinds of cost functions. Mean Squared Error (MSE) is one of them. In the following function, the objective of backpropagation is to minimize MSE value where, Yi is the target value and Yi^ is the predicted value of ith training instance.
Mean Squared Error

For each training instance, backpropagation algorithm first makes a prediction, measures the error, then back propagates the error to measure the error contribution from each connection (reverse pass), and update the weights using Gradient Descent.

To understand backpropagation in detail, check out this awesome visual explanation.
Despite all the benefits, Vanishing Gradients and Exploding Gradients are the two common problems that prevent neural networks from converging.

Activation Function

Activation function is most important in neural networks which introduces non-linearity to the neural networks. Activation function is used in every layer except in input layer. The different layers can have different activation functions. Each neuron has an activation function which takes the input and applies some transformations to produce output.
ReLU activation function is commonly used in hidden layers. For output layer, Softmax is the most popular activation function for classification tasks and no activation function is used for regression tasks.
Most common types of activation function are as follows:

  1. Sigmoid Activation Function
    It takes input and outputs a value in the range: (0, 1).
  2. Hyperbolic Tangent Function (tanh)
    This function takes input and outputs a value in the range: (-1, 1).
  3. Rectified Linear Unit (ReLU)
    It takes input and outputs a value in the range: max(0, z) i.e. ReLU (z) = max(0, z).

Dropout Regularization

Overfitting is the major problem in every machine learning models that prevents model to generalize well. Regularization is important to overcome overfitting. There are many regularization techniques for neural nets. Dropout is the most useful regularization technique for neural networks especially in Computer Vision. It works by randomly "dropping out" neurons in a network. The more you drop out, the stronger the regularization.
Dropout enables regularization by allowing to train smaller network and spread out weights of a neuron among all of its input features.
The neural network without dropout (Standard neural net) is known as fully connected neural network (FCNN).

In Summary,
  • Artificial Neural Network or simply Neural Network is a network of neurons inspired by the biological neural networks. Mathematically, it is just a function that can solve any problem.
  • Neural Network introduces non-linearity to the model and learns different feature crosses by itself.
  • When a neural network has two or more hidden layers, it is called a Deep Neural Network.
  • Backpropagation is the best training algorithm for Neural Network especially for deep learning.
  • Activation function is the most important component in neural network.
  • Neural network with many neurons and hidden layers isn't always the best.

Comments