*This post is part of my #100DaysOfMLCode challenge. Learning about Machine Learning for 100days continuously. I have taken images from my specialization course which I have taken on Coursera.

*please ignore my writing skills. and also I didn’t organize the information well. This Post is about what I have understood from the videos which I have watched.

Coursera’s Deep Learning Specialization course homepage: https://www.coursera.org/learn/neural-networks-deep-learning#

# Neural Networks Basics

This is the Week2 of the Neural Networks and Deep Learning Specialization course.

## Logistic Regression as a Neural Network

This Week we go over the basics of Neural Network programming.

• While implementing the Neural Network, Implementation techniques also play an important role.
• In this week we learn about a method where neural network processes entire training dataset(m) without taking help of explicit ‘for’ loop for processing the training set.
• We also learn about why the computation of neural network requires Forward propagation and backward propagation.

## Binary Classification

• Logistic regression is an algorithm for Binary Classification.
• denotes binary classification problem. Input can be an image and we need to find whether the image is the cat or not. we will use ‘y’ to denote the output.
• Computer will store the image in the form of 3 matrices(Red, Green and Blue color channels). These are the pixel Intensity Values. •  fig shows that if you have a 64 by 64 image then you should have three 64 by 64 matrices denoting Red, Green and blue color channels.
• as you can see. We can un row the pixel intensity values from matrices and put them in feature X. X has the elements around 3*64*64 as there are three matrices.
• In Binary Classification our goal is to learn or train a classifier which takes images as input in the form of feature Vector X and outputs the value as Y.
• shows the basic notations used for logistic regression. m represents training examples. Here X contains m-training set values of x which are arranged in column wise.
• as you can see Y contains the output values which are arranged in column wise.
• In python there is a library( NumPy) with which we can able to find the dimensions of a particular matrix. The function is ‘shape’. Usage = X.shape()

## Logistic Regression

• Logistic Regression is a learning algorithm which is used for Binary Classification problem which has output labels 0 or 1( Supervised Learning Problem).
• Here w, b are parameters .
• if we use this Linear function, then it would be not appropriate because values can be negative or values can be larger than 1. So this linear function is not appropriate for Binary classification.
• We use Sigmoid function for the output value of Y.
• As you can see in the image. If the value of Z is large then sigmoid of Z will be nearly equals to 1. But when the value of Z is large negative number. Then sigmoid of Z will be nearly Zero.
• The graph depicts the values of Z as a function of (sigmoid)Z.
• Our goal is to learn to find the parameters w,b such that y(cap) becomes good estimate when Y=1.

## Logistic Regression Cost Function

• To train the parameters w,b we need to have a Cost function.
• SuperScript (i) represents ith element in a training set.
• Loss function will tell us how well our algorithm is working( in this case the algorithm is Logistic regression).
• we often don’t use this loss function in logistic regression because of Gradient descent fails to find optimum value.as there are several global optimum values.
• Loss function is nothing but a squared error of actual and predicted value.
• This loss function is used for Logistic Regression
• as you can see if y=1, then we want y(cap) to be as large as possible and when
y=0, we want y(cap) to be as small as possible.
• Loss function is defined to tell how well our algorithm is working for a single example from a
training set. But a cost function is defined for entire training set.
• shows the cost function. Which is defined for entire training set.
• We will find the values of parameters(w,b) such that the overall cost function is minimum.
• Logistic Regression can be viewed as a small neural network.

• we need to find the values of w,b such that Cost function is minimum.
• w-can be high dimensional but for plotting we will consider it as a single real number.
• This function J is a type of convex function. So we can able to find the minimum value.
• The main reason why we use this type of cost function is because it is a convex Function.
• In order to find the good values of parameters w,b which makes the overall cost function to a minimum value, we will set w,b with some initial values.
• We can use any initialization method for the parameters. As this is a convex function whatever value we initialize it will come to minimum value. No need to worry about that.
• Gradient Descent will start from the initial step with initial values of the parameters and gradually descents in steepest way. Gradually it goes to global optimum.
• shows how gradient Descent works. It will repeat the step as shown in the picture. It will do the derivative of the cost function with the help of alpa.
• Alpa () is the learning rate. It controls how bigger step we take in each iteration of the gradient Descent.
• (dw) denote the variable which has derivative.
• as you can see if the derivative value is negative then w will proceed in positive way
and when derivative value is positive then will proceed in negative way.
• In calculus we use different symbol to represent a derivative of a function which has two variables. Here we will use partial derivative function as there are two variables.

## Computation Graph

• Computation Graph is used to find the derivatives for the complex functions.
• Computations of a neural network are organized in the form of Forward propagation and backward propagation.
• In forward propagation output of the neural network is expected.
• In backward propagation step, we compute gradient or derivatives.
• The Computation Graph explains why this is organized in this manner.

## Derivatives with a Computation Graph

• shows the computation graph.

• shows how we can find derivatives for gradient descent for a particular training example.
• there are some methods like Vectorization in order to avoid explicit for-loops.