Logistic Regression with Python Using An Optimization Function

Logistic regression is a powerful classification tool. It can be applied only if the dependent variable is categorical. There are a few different ways to implement it. Today I will explain a simple way to perform binary classification. I will use an optimization function that is available in python.

Concepts and Formulas

Logistic regression uses a sigmoid function to estimate the output that returns a value from 0 to 1. As this is a binary classification, the output should be either 0 or 1. Here is the sigmoid function:

Here z is a product of the input variable X and a randomly initialized coefficient theta.

One theta value needs to be initialized for each input feature. A very important parameter in the cost function. Cost function gives an idea about how far the prediction is from the actual output. Here is the formula for the cost function:

Here, y is the original output variable and h is the predicted output variable. Our goal is to minimize the cost as much as possible. Now, We need to update the theta values, so that our prediction is as close as possible to the original output variable. If we take a partial differentiation of cost function by theta, we will find the gradient for the theta values. I am not going to the calculus here. Our gradient descent that will be used to update the theta will come out to be:

If you did not understand all the equations, do not worry about it yet. Please look at the implementation part. Hopefully, you will understand how to use all the equations.

Python Implementation of Logistic Regression

  1. Import the necessary packages and the dataset. I found this dataset from Andrew Ng’s machine learning course in Coursera.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
df = pd.read_csv('ex2data1.txt', header=None)
df.head()

2. Separate the input variables and the output variables. In this dataset, column 0 and 1 are the input variables and column 2 is the output variable. So, we will have to predict column 2.

X = df.iloc[:, :-1]
y = df.iloc[:, -1]

3. Add a bias column to the X. The value of the bias column is usually one.

X = np.c_[np.ones((X.shape[0], 1)), X]
X[:10]

 




4. Here, our X is a two-dimensional array and y is a one-dimensional array. Let’s make the ‘y’ two-dimensional to match the dimensions.

y = y[:, np.newaxis]
y[:10]

5. Define the sigmoid function

def sigmoid(x, theta):
    z= np.dot(x, theta)
    return 1/(1+np.exp(-z))

6. Use this sigmoid function to write the hypothesis function that will predict the output:

def hypothesis(theta, x):
    return sigmoid(x, theta)

7. Write the definition of the cost function using the formula explained above.

def cost_function(theta, x, y):
    m = X.shape[0]
    h = hypothesis(theta, x)
    return -(1/m)*np.sum(y*np.log(h) + (1-y)*np.log(1-h))

8. Write the gradient descent function as per the equation above:

def gradient(theta, x, y):
    m = X.shape[0]
    h = hypothesis(theta, x)
    return (1/m) * np.dot(X.T, (h-y))

9. Import an optimization function that will optimize the theta for us. This optimization will take the function to optimize, gradient function, and the argument to pass to function as inputs. In this problem, the function to optimize is the cost function. Because we want to minimize the cost, the gradient function will be the gradient_descent and the arguments are X and y. This function will also take ‘x0’ which is the parameters to be optimized. In our case, we need to optimize the theta. So, we have to initialize the theta. I initialized the theta values as zeros. As I mentioned earlier, we need to initialize one theta values for each input feature. We have three input features. If you look at the X, we have 0 and 1 columns and then we added a bias column. So, we need to initialize three theta values.

theta = np.zeros((X.shape[1], 1))
from scipy.optimize import minimize,fmin_tnc
def fit(x, y, theta):
    opt_weights = fmin_tnc(func=cost_function, x0=theta, fprime=gradient, args=(x, y.flatten()))
    return opt_weights[0]
parameters = fit(X, y, theta)

The parameters came out to be [-25.16131854, 0.20623159, 0.20147149].

10. Use these parameters as the theta values and the hypothesis function to calculate the final hypothesis.

h = hypothesis(parameters, X)

11. Use the hypothesis to predict the output variable:

def predict(h):
    h1 = []
    for i in h:
        if i>=0.5:
            h1.append(1)
        else:
            h1.append(0)
    return h1
y_pred = predict(h)

12. Calculate the accuracy.

accuracy = 0
for i in range(0, len(y_pred)):
    if y_pred[i] == y[i]:
        accuracy += 1
accuracy/len(y)

The final accuracy is 89%.

You can perform this logistic regression using gradient descent as an optimization function as well. 

#MachineLearning #LogisticRegression #Python

Leave a Reply

Close Menu