Logistic regression is very popular in machine learning and statistics. It can work on both binary and multiclass classification very well. I wrote tutorials on both binary and multiclass classification with logistic regression before. This article will be focused on image classification with logistic regression.

If you are totally new to logistic regression, please go to this article first. This article has a detailed explanation of how a simple logistic regression algorithm works.

It will be helpful if you are familiar with logistic regression already. If not, I hope you will still understand the concepts here. I tried to explain it well.

If you are reading this to learn, the only way is to run all the codes by yourself.

## Problem Statement

The idea of this project is to develop and train a model that is able to take the pixel values of a digit and identify if it is an image of the digit one or not.

The dataset that will be used in this tutorial is very commonly used in machine learning tutorials. The famous digits dataset. Each row of the dataset represents the flattened pixel values of a digit. I will show you in detail later.

## Data Preparation

This dataset contains the pixel values of the digits from zero to nine. But because this tutorial is about binary classification, the goal of this model will be to return 1 if the digit is one and 0 otherwise. Please feel free to download the dataset from the link below to follow along.

Here I am importing the dataset:

`import pandas as pd`

import numpy as npdf= pd.read_excel('ex3d1.xlsx', 'X', header=None)

df.head()

You can see that the dataset has 400 columns. That means each row has 400-pixel values and each row represents one digit. Let’s check some of the digits using the ‘imshow’ function of the matplotlib library. Notice that the pixel values of images are originally not one-dimensional. That’s why it was reshaped into a 20 x 20 two-dimensional array before passing into the ‘imshow’ function.

`import matplotlib.pyplot as plt`

plt.imshow(np.array(df.iloc[500, :]).reshape(20,20))

It’s one! Here I used the 500th row of the dataset.

Here is another one using the 1750th row of the dataset:

`plt.imshow(np.array(X.iloc[1750, :]).reshape(20,20))`

It’s three.

Let’s check how many rows are in this dataset:

`len(df)`

Output:

`5000`

Labels are stored in a different sheet in this excel file. Here are the labels:

`df_y= pd.read_excel('ex3d1.xlsx', 'y', header=None)`

df_y.head()

I am only showing the head of the dataset that brings the first five rows. Because this model will identify the digit 1 only, it will return 1 if the digit is 1 and 0 otherwise. So, in the label, I will keep only 1 and the rest of the digits will become zero. Let’s convert the rest of the digits as zeros.

For that

`y = df_y[0]`

for i in range(len(y)):

if y[i] != 1:

y[i] = 0y = pd.DataFrame(y)

y

Out of these 5000 rows of data, 4000 rows will be used to train the model, and the remaining 1000 rows will be used to test the model. It is important for any machine learning or deep learning model to be tested by unseen data to the model.

`x_train = X.iloc[0:4000].T`

y_train = y.iloc[0:4000].Tx_test = X.iloc[4000:].T

y_test = y.iloc[4000:].T

Using .T, we are taking the transpose of each dataset. These training and test datasets are in DataFrame form. They need to be in an array format for the convenience of calculation.

`x_train = np.array(x_train)`

y_train = np.array(y_train)

x_test = np.array(x_test)

y_test = np.array(y_test)

The training and test datasets are ready to be used in the model. This is the time to develop the model.

Step 1:

The logistic regression uses the basic linear regression formula that we all learned in high school:

Y = AX + B

Where Y is the output, X is the input or independent variable, A is the slope and B is the intercept.

In logistic regression variables are expressed in this way:

**Formula 1**

Here z is the output variable, x is the input variable. w and b will be initialized as zeros to start with and they will be modified by numbers of iterations while training the model.

This output z is passed through a non-linear function. The commonly used nonlinear function is the sigmoid function that returns a value between 0 and 1.

**Formula 2**

As a reminder, the formula for the sigmoid function is:

**Formula 3**

This ‘a’ will be the final output that is the value in the ‘y_train’ or ‘y_test’.

Here is the function to define the sigmoid function for later use:

`def sigmoid(z):`

s = 1/(1 + np.exp(-z))

return s

As we mentioned before, w and b will be initialized as zeros. One w value will be initialized for each pixel value. Let’s define a function to initialize the zero value for w and b:

`def initialize_with_zeros(dim):`

w = np.zeros(shape=(dim, 1))

b = 0

return w, b

Cost Function

The cost function is a measure of a model that reflects how much the predicted output differs from the original output. Here is the formula for the cost function of one training example or one row of data:

**Formula 4**

The average cost function for all the rows is:

The aim of the model will be to lower the cost function value.

Gradient descent

We need to update the variables w and b of Formula 1. It would be initialized as zeros but they need to be updated later with more appropriate values. Gradient descent will help with that. Let’s see how.

In Formula 4, we expressed the cost function as a function of ‘a’ and ‘y’. But it can be expressed as a function of ‘w’ and ‘b’ as well. Because ‘a’ is derived using ‘w’ and ‘b’.

The formula for the differential ‘w’ and ‘b’ will be derived by taking the partial differentiation of cost function with respect to ‘w’ and ‘b’.

**Formula 5 and Formula 6**

Now that we have all the formulas, let’s put it all together in a function called ‘propagate’:

`def propagate(w, b, X, Y):`

#Find the number of training data

m = X.shape[1]

#Calculate the predicted output

A = sigmoid(np.dot(w.T, X) + b)

#Calculate the cost function

cost = -1/m * np.sum(Y*np.log(A) + (1-Y) * np.log(1-A))

#Calculate the gradients

dw = 1/m * np.dot(X, (A-Y).T)

db = 1/m * np.sum(A-Y)

grads = {"dw": dw,

"db": db}

return grads, cost

The propagate function calculates the predicted output that is ‘A’, cost function ‘cost’, and the gradients ‘dw’ and ‘db’. Using this function, we can now update ‘w’ and ‘b’ in Formula 1. That is the next step.

Optimize the parameters to best fit the training data

In this step, we will update the parameters which are the core of this model. The ‘propagate’ function will be run through a number of iterations. In each iteration, ‘w’ and ‘b’ will be updated. Below is a complete ‘optimize’ function. **I explained each step in the code snippet. Please read carefully.**

A new term learning rate was introduced in this function. That’s not a calculated value. It is different for the different machine learning algorithms. Try a few different learning rates to see which works best.

`def optimize(w, b, X, Y, num_iterations, learning_rate, print_cost = False):`

costs = []

#propagate function will run for a number of iterations

for i in range(num_iterations):

grads, cost = propagate(w, b, X, Y)

dw = grads["dw"]

db = grads["db"]

#Updating w and b by deducting the dw

#and db times learning rate from the previous

#w and b

w = w - learning_rate * dw

b = b - learning_rate * db

#Record the cost function value for each 100 iterations

if i % 100 == 0:

costs.append(cost)

#The final updated parameters

params = {"w": w,

"b": b}

#The final updated gradients

grads = {"dw": dw,

"db": db}

return params, grads, costs

We have the function to optimize the parameters. This is the time to predict the output:

**Explanation of each line of code is embedded in between the code snippet. Please read carefully to understand it well.**

`def predict(w, b, X):`

m = X.shape[1]

#Initializing an aray of zeros which has a size of the input

#These zeros will be replaced by the predicted output

Y_prediction = np.zeros((1, m))

w = w.reshape(X.shape[0], 1)

#Calculating the predicted output using the Formula 1

#This will return the values from 0 to 1

A = sigmoid(np.dot(w.T, X) + b)

#Iterating through A and predict an 1 if the value of A

#is greater than 0.5 and zero otherwise

for i in range(A.shape[1]):

Y_prediction[:, i] = (A[:, i] > 0.5) * 1

return Y_prediction

Final Model

Putting all the functions together, the final model will look like this:

`def model(X_train, Y_train, X_test, Y_test, num_iterations = 2000, learning_rate = 0.5):`

#Initializing the w and b as zeros

w, b = initialize_with_zeros(X_train.shape[0]) parameters, grads, costs = optimize(w, b, X_train, Y_train, num_iterations, learning_rate, print_cost)

w = parameters["w"]

b = parameters["b"]

# Predicting the output for both test and training set

Y_prediction_test = predict(w, b, X_test)

Y_prediction_train = predict(w, b, X_train)

#Calculating the training and test set accuracy by comparing

#the predicted output and the original output

print("train accuracy: {} %".format(100 - np.mean(np.abs(Y_prediction_train - Y_train)) * 100))

print("test accuracy: {} %".format(100 - np.mean(np.abs(Y_prediction_test - Y_test)) * 100))

d = {"costs": costs,

"Y_prediction_test": Y_prediction_test,

"Y_prediction_train" : Y_prediction_train,

"w" : w,

"b" : b,

"learning_rate" : learning_rate,

"num_iterations": num_iterations}

return d

The complete logistic regression model is ready!

Using the model

This is the time to use the model to see how well it works. Let’s pass our data that we prepared at the beginning to the model:

`d = model(train_x, train_y, test_x, test_y, num_iterations = 2000, learning_rate = 0.005)`

Output:

`train accuracy: 99.75 %`

test accuracy: 99.5 %

Isn’t the accuracy just excellent!

As you can see from the ‘model’ function that our final model returns a dictionary that contains the costs, final parameters, predicted outputs, learning rate, and the number of iterations used. Let’s see how cost function changed with each updated ‘w’s and ‘b’s:

`plt.figure(figsize=(7,5))`

plt.scatter(x = range(len(d['costs'])), y = d['costs'], color='black')

plt.title('Scatter Plot of Cost Functions', fontsize=18)

plt.ylabel('Costs', fontsize=12)

plt.show()

Look, with each iteration, the cost function went down as it should. That means with each iteration the parameters ‘w’ and ‘b’ kept refining towards perfection.

## Conclusion

If you could run all the code and could understand most of it, you just learned how a logistic regression works! Congratulations! If this model is not completely understandable to you yet, I suggest break down the functions and run each line of code individually. That should give you a better idea. Please feel free to ask questions in the comment section, if you have any problem.

Feel free to follow me on Twitter and like my Facebook page.

#machinelearning #DeepLearning #python #programming #DataScience