Logistic regression is one of the most widely used classification algorithms. In one of my previous blogs, I talked about the definition, use and types of logistic regression. In this article I want to focus more about its functional side. First, the idea of cost function and gradient descent and implementation of the algorithm with python will be presented. At the end, same model will be implemented with scikit-learn library.
Here is the link for my previous article on Logistic Regression:
Cost Function
The definition of cost function is same as the cost function of linear regression. Does that mean, Cost function of linear regression and logistic regression are exactly the same? Not really. Because The hypothesis is different. Just as a reminder hypothesis for logistic regression is shown below. This hypothesis equation is called sigmoid function or logistic function.

It creates another problem. Because sigmoid function is not linear, so the cost function representation will be non-convex like this picture. It will not just come to a global convergence, instead we will get several local minima like this.

To solve this problem, this equation is used to get the cost function in logistic regression.

Gradient descent is the same linear. Repeat until convergence.
Let’s dive into most exciting part of it, the code implementation of logistic regression.
Logistic Regression With Python
First let’s define the hypothesis and Gradient Descent
def hypothesis(theta, X, n):
h = np.ones((X.shape[0],1))
theta = theta.reshape(1,n+1)
for i in range(0,X.shape[0]):
h[i] = 1 / (1 + np.exp(-float(np.matmul(theta, X[i]))))
h = h.reshape(X.shape[0])
return h
def GD(theta, alpha, num_iters, h, X, y, n):
theta_history = np.ones((num_iters,n+1))
cost = np.ones(num_iters)
for i in range(0,num_iters):
theta[0] = theta[0] - (alpha/X.shape[0]) * sum(h - y)
for j in range(1,n+1):
theta[j]=theta[j]-(alpha/X.shape[0])*sum((h-y)
*X.transpose()[j])
theta_history[i] = theta
h = hypothesis(theta, X, n)
cost[i]=(-1/X.shape[0])*sum(y*np.log(h)+(1-y)*np.log(1 - h))
theta = theta.reshape(1,n+1)
return theta, theta_history, cost
def logistic_regression(X, y, alpha, num_iters):
n = X.shape[1]
one_column = np.ones((X.shape[0],1))
X = np.concatenate((one_column, X), axis = 1)
# initializing the parameter vector...
theta = np.zeros(n+1)
# hypothesis calculation....
h = hypothesis(theta, X, n)
# returning the optimized parameters by Gradient Descent...
theta,theta_history,cost = GD(theta,alpha,num_iters,h,X,y,n)
return theta, theta_history, cost
The above mentioned functions have been taken from this article written by Navoneel Chakrabarty Now we will use these functions on a dataset and see how we can predict the labels of a test dataset.
First let’s import the dataset in a ipython notebook. The dataset is available in my github repository. Please click here for the dataset.
df = pd.read_table('marks.txt', delimiter = ',')
X = df[['X1','X2']]
y = df['y']
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=0)
x3 = X[y == 0]
x4 = X[y == 1]
import matplotlib.pyplot as plt
plt.figure()
plt.scatter(x3['X1'], x3['X2'], c = 'r', label = 'Not Admitted')
plt.scatter(x4['X1'], x4['X2'], c = 'green', label = 'Admitted')
plt.xlabel("Marks obtained in 1st Exam")
plt.ylabel("Marks obtained in 2nd Exam")
plt.legend()
plt.show()
def add_one_column(X):
one_column = np.ones((X.shape[0],1))
return np.concatenate((one_column, X), axis = 1)
def predict(X, y, thr = 0.5):
X_new = add_one_column(X)
n = X.shape[1]
predicted_classes = (hypothesis(theta, X_new, n) >= thr).astype(int)
#predicted_classes = predicted_classes.flatten()
accuracy = np.mean(predicted_classes == y)
return accuracy * 100
predict(X_train, y_train)
61.33333333333333
predict(X_test, y_test)
56.00000000000001
for i in np.arange(0, 1, 0.05):
print ("Training Set Accuracy for : " + str(i) + " is " + str(predict(X_train, y_train, thr = i)))
print ("Test Set Accuracy: " + str(i) + " is " + str(predict(X_test, y_test, thr = i )))
for i in np.arange(0, 1, 0.05):
print ("Training Set Accuracy for " + str(i) + " is: " + str(predict(X_train, y_train, thr = i)))
print ("Test Set Accuracy for " + str(i) + " is: " + str(predict(X_test, y_test, thr = i )))
Training Set Accuracy for 0.5 is: 61.33333333333333 Test Set Accuracy for 0.5 is: 56.00000000000001 Training Set Accuracy for 0.55 is: 68.0 Test Set Accuracy for 0.55 is: 56.00000000000001 Training Set Accuracy for 0.6000000000000001 is: 80.0 Test Set Accuracy for 0.6000000000000001 is: 68.0 Training Set Accuracy for 0.65 is: 82.66666666666667 Test Set Accuracy for 0.65 is: 92.0 Training Set Accuracy for 0.7000000000000001 is: 77.33333333333333 Test Set Accuracy for 0.7000000000000001 is: 68.0 Training Set Accuracy for 0.75 is: 56.00000000000001 Test Set Accuracy for 0.75 is: 48.0
From all the results above, accuracy score is the best in probability threshold 0.65.
Logistic Regression with Optimized Cost Function
It is a good idea to start with writing pure python code step by step to learn the basics and know what runs behind the scene. Now let’s see a little more efficient and faster way of implementating the same same algorithm.
Let’s start with defining the hypothesis. Here I am calling it the probability.
def sigmoid(x):
return 1 / (1 + np.exp(-x))
def hypothesis(theta, x):
return sigmoid(np.dot(x, theta))
def cost_function(theta, x, y):
m = x.shape[0]
total_cost = -(1 / m) * np.sum(
y*np.log(hypothesis(theta, x)) + (1 - y) * np.log(
1 - hypothesis(theta, x)))
return total_cost
def gradient(theta, x, y):
m = x.shape[0]
return (1/m) * np.dot(x.T, sigmoid(net_input(theta, x)) - y)
X = np.c_[np.ones((X.shape[0], 1)), X]
y = y[:, np.newaxis]
theta = np.zeros((X.shape[1], 1))
from scipy.optimize import fmin_tnc
def fit(x, y, theta):
opt_weights = fmin_tnc(func=cost_function, x0=theta,
fprime=gradient, args=(x,y.flatten()))
return opt_weights[0]
parameters = fit(X, y, theta)
def predict(x):
theta = parameters[:, np.newaxis]
return hypothesis(theta, x)
def accuracy(x, actual_classes, probab_threshold = 0.65):
predicted_classes = (predict(x) >= probab_threshold).astype(int)
predicted_classes = predicted_classes.flatten()
accuracy = np.mean(predicted_classes == actual_classes)
return accuracy * 100
accuracy(add_one_column(X_train), y_train)
accuracy(add_one_column(X_test), y_test)
Logistic Regression Using scikit-learn
Now I want to use the logistic regression model that is available in scikit-learn library. It’s very useful to have a library like that.
We need to start with importing Logistic Regression model from scikit-learn library. Then fit our training data in the model. Now, we can use this trained model to predict the admitted or not admitted label and finally get the accuracy score.
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score
model = LogisticRegression()
model.fit(X_train, y_train)
predicted_classes = model.predict(X_train)
accuracy = accuracy_score(y_train,predicted_classes)
parameters = model.coef_
parameters, accuracy
(array([[0.0330462 , 0.02631392]]), 0.8133)
model_fit = model.fit(X_train, y_train)
model_fit.score(X_test, y_test)
model_fit.score(X_train, y_train)