## Multiclass Classification With Logistic Regression One vs All Method From Scratch Using Python

#### In this article, learn how to develop an algorithm using Python for multiclass classification with logistic regression one vs all method described in week 4 of Andrew Ng’s machine learning course in Coursera.

Logistic regression is a very popular machine learning technique. We use logistic regression when the dependent variable is categorical. This article will primarily focus on the implementation of logistic regression. I am assuming that you already know how to implement a binary classification with Logistic Regression. If not, please see the links at the end to learn the concepts of machine learning and the implementation of Linear regression and the basic logistic regression.

The implementation of Multiclass classification follows the same ideas as the binary classification. As you know in binary classification, we replace two classes with 1 and 0 respectively. In one vs all method, when we work with a class, that class is denoted by 1 and the rest of the classes becomes 0. It will be more understandable to you when you will implement it. I suggest, you keep coding and running the codes as you read.

#### Python Implementation

Here I am going to show the implementation step by step.

1. Import the necessary packages and the dataset. I took the dataset from Andrew Ng’s Machine Learning course in Coursera. This is a handwriting recognition dataset. There are digits from 1 to 10. From the dataset of pixels, we need to recognize the digit.
import pandas as pd
import numpy as np
xl = pd.ExcelFile('ex3d1.xlsx')
y = pd.read_excel(xl, 'y', hearder = None)

2. Define the hypothesis that takes the input variables and theta. It returns the calculated output variable.

def hypothesis(theta, X):
return 1 / (1 + np.exp(-(np.dot(theta, X.T)))) - 0.0000001

3. Build the cost function that takes the input variables, output variable, and theta. It returns the cost of the hypothesis. That means it gives the idea about how far the prediction is from the original outputs.

def cost(X, y, theta):
y1 = hypothesis(X, theta)
return -(1/len(X)) * np.sum(y*np.log(y1) + (1-y)*np.log(1-y1))

4. Now, it’s time for data preprocessing. The data is clean. Not much preprocessing is required. We need to add a bias column in the input variables. Please check the length of df and y. Because the length is different, this model will not work.

print(len(df))
print(len(y))
X = pd.concat([pd.Series(1, index=df.index, name='00'), df], axis=1)

5. y column has the digits from 1 to 10. That means we have 10 classes. We will make one column for each of the classes with the same length as y. When the class is 5, make a column that has 1 for the row with 5 and 0 otherwise. We will do it programmatically with some simple code:

for i in range(0, len(y.unique())):
for j in range(0, len(y1)):
if y[j] == y.unique()[i]:
y1.iloc[j, i] = 1
else:
y1.iloc[j, i] = 0

6. Define the function ‘gradient_descent’ now. This function will take input variables, output variable, theta, alpha, and the number of epochs as the parameter. Here, alpha is the learning rate. You should choose it as per your requirement. The too small or too big learning rate can make your algorithm slow. It may take a few iterations to select the right learning rate. For each of the columns in y1, we will implement a binary classification.

def gradient_descent(X, y, theta, alpha, epochs):
m = len(X)
for i in range(0, epochs):
for j in range(0, 10):
theta = pd.DataFrame(theta)
h = hypothesis(theta.iloc[:,j], X)
for k in range(0, theta.shape[0]):
theta.iloc[k, j] -= (alpha/m) * np.sum((h-y.iloc[:, j])*X.iloc[:, k])
theta = pd.DataFrame(theta)
return theta, cost

7. Initialize the theta. Remember, we will implement logistic regression for each class. There will be a series of theta for each class as well.

theta = np.zeros([df.shape[1]+1, y1.shape[1]])
theta = gradient_descent(X, y1, theta, 0.02, 1500)

8. With this updated theta, calculate the output variable.

output = []
for i in range(0, 10):
theta1 = pd.DataFrame(theta)
h = hypothesis(theta1.iloc[:,i], X)
output.append(h)
output=pd.DataFrame(output)

9. Compare the calculated output and the original output variable to calculate the accuracy of the model.

accuracy = 0
for col in range(0, 10):
for row in range(len(y1)):
if y1.iloc[row, col] == 1 and output.iloc[col, row] >= 0.5:
accuracy += 1
accuracy = accuracy/len(X)

The accuracy is 83%.The accuracy is 72%.

Please ask me if you have any questions. Check this GitHub page for the dataset.