A Complete recommender System From Scratch in Python Using Linear Regression

A Complete recommender System From Scratch in Python Using Linear Regression

Nowadays, we see recommendation systems everywhere. When you buy something in an online marketplace like Amazon, eBay, or any other place, they suggest similar products. On Netflix or youtube, you see the suggestions on your homepage similar to your previous activities or searches. How they do it? They all follow this one idea. That is they take data from your previous activities and run a similarity analysis. Based on that analysis they suggest more products or videos or movies you may like.


In this article, I will explain a recommender system that used the same idea. Here is the list of topic that will be covered here:

  1. The ideas and formulas for the recommendation system.
  2. developing the recommendation system algorithm from scratch
  3. Use that algorithm to recommend movies for me.

I will use some of Python’s libraries like Numpy, Pandas, and Matplotlib for efficient and faster computation. Though our datasets are not too large. But we want to develop something that will work for even bigger datasets. I used a Jupyter Notebook environment. Feel free to use any other notebook of your choice.

How This Recommender System Works?

In this section, I will provide a high-level overview of the process. If it is not totally understandable to you, please keep looking at the next sections. Because in the next sections I will implement all these ideas in python code. So, it will be more clear.

Let’s dive in. Suppose our dataset looks something like this:

Here, we have five movies, four users, and two features. Each user provided some feedback or ratings for each movie. Of course, each user did not watch all the movies. So, sometimes the rating is not available.

In the end, we have two features: Romance and Action. They are giving you an idea about how romantic or how action-packed the movie is. This rating ranges between 0 and 1. 0 romance means no romance and 1 romance means full of romance.

This algorithm will be developed the recommendation system using the user ratings.

If a user watched a lot of movies and rated them, this algorithm will work the best for that user.

But if a certain user did not provide any rating, he or she will get the recommendation based on the other users’ ratings.

What to do With the missing Movie Ratings?

This is a valid question. Not all the users watch all the movies and sometimes they just do not rate the movie after watching. So, it is normal to have a lot of missing data. In that case, we need to find a way to fill up those missing data.

A linear regression method can be used to fill up those missing data. As a reminder, here is the formula for linear regression:

Y = C + BX

We all learned this equation of a straight line in high school. Here, Y is the dependent variable, B is the slope and C is the intercept. Traditionally, for linear regression, the same formula is written as:

Here, ‘h’ is the hypothesis or the predicted value, X is the input feature, and theta0 and theta1 are the coefficients.

If you did not work on linear regression before, please check this article on linear regression first.

In this recommendation system, we will use the other ratings of the same movie as the input X and predict the missing values. We will avoid the bias term theta0.

Here theta1 is initiated randomly in the beginning and refines with iterations as linear regression algorithm does.

How to Refine the Values of Theta1?

As in linear regression, we will train the algorithm with the known values. Take the known ratings of a movie. Then predict those known ratings using the formula above. After we predict the values of ratings, compare them with the original ratings to find the error term. Here is the error for one rating.

In the same way, we need to find the error for all the ratings. Before that, I want to introduce the notations I will use in this whole tutorial,

Here is the formula for the total cost function which will indicate the distance between the predicted ratings and the original ratings.

The first term of this formula shows the squared of the error term. We take the square to avoid any negative values. Use 1/2 to optimize the squared and we calculate the error term where r(i, j) = 1. Because r(i, j) =1 means the user provided the rating.

The second term of the equation above is the regularization term. It is useful to regularize any overfitting or underfitting problem.

Total Cost Function?

How to know that a movie is romantic, comedy, or action-packed. It will be really time-consuming and expensive to keep watching all the movies and find out its features like that.

So, we can take the idea from the user’s ratings. The way we randomly initialize theta values and slowly refine it with iterations, we can do the same for finding features X. The formula should be almost similar to the cost function above.

In both cases, there will be the same error term. We can add the cost function for both theta and x as below:

Gradient Descent

Gradient descent updates the theta and X in each iteration. Here is the formula for gradient descent:

In this formula, alpha is the learning rate. In each iteration, theta and X values will be updated and eventually becomes stable.

Implementation of Movie Recommendation Algorithm

I will use the datasets from Andrew Ngs Machine Learning Course in Coursera. He is the best person to break down a machine learning problem into pieces. Feel free to download the datasets from this link and follow along.

The only way is to follow this code and run while reading if you are reading this to learn.
Step 1:

Import the necessary packages and the datasets.

import numpy as np
import pandas as pd

The first dataset is the dataset that contains the ratings of all the users for all the movies.

y = pd.read_excel('ex8_movies.xlsx', sheet_name = 'y', header=None)

The next dataset contains true if the user provided a rating and False if the user did not provide the rating.

r = pd.read_excel('ex8_movies.xlsx', sheet_name='R', header=None)

We need to convert these boolean values into numeric values. I will replace True with 1 and False with 0.

for i in range(len(r.columns)):
    r[i] = r[i].replace({True: 1, False: 0})

We have features X in this dataset:

X = pd.read_excel('movie_params.xlsx', sheet_name='X', header=None)

Theta values are stored here,

theta = pd.read_excel('movie_params.xlsx', sheet_name='theta', header=None)

Let’s check the shape of the datasets.




(1682, 943)

That means we have 1682 movies and 943 users.




(1682, 10)

As you remember X contains the features. We have 10 features in this dataset.




(943, 10)




(1682, 943)

This was all the datasets necessary for this algorithm. X and theta can be randomly initialized. We will see it towards the end. For now, we will use this X and theta.

Now, we will develop all the necessary functions.

Cost Function

Very simply, we will use the total cost function we described above. It will take X, y, r, theta, and Lambda as input and return the cost and gradient.

def costfunction(X, y, r, theta, Lambda):
    predictions = np.dot(X, theta.T)
    err = predictions-y
    J = 1/2 * np.sum((err**2) * r)
    reg_x = Lambda/2 * np.sum(np.sum(theta**2))
    reg_theta = Lambda/2 * np.sum(np.sum(X**2))
    grad = J + reg_x + reg_theta
    return J, grad
Gradient Descent

In this function, we will use the gradient descent formulas discussed above. It will take X, y, r, theta, Lambda, alpha, and the number of iterations as the parameters. We will record the cost in each iteration using the cost function and will return the updated X, theta, and the list of costs.

def gradientDescent(X, y, r, theta, Lambda, num_iter, alpha):
    J_hist = []
    for i in range(num_iter):
        cost, grad = costfunction(X, y, r, theta, Lambda)
        X = X -  alpha*(np.dot(np.dot(X, theta.T) - y, theta) + Lambda*X)
        theta = theta - alpha*(np.dot((np.dot(X, theta.T) - y).T, X) + Lambda*theta) 
    return X, theta, J_hist

In this function, we will normalize the rating ‘y’. First, we will calculate the mean ratings for each movie. For this, we will sum the ratings for each movie and divide it by the sum of ‘r’ for that movie. Remember, the ‘r’ dataset contains 1 if the user-provided the rating and 0 if the user did not provide the rating.

Normalized y will be the sum of y (if the users provided the ratings) minus the mean ‘y’.

def normalizeRatings(y, r):
    ymean = np.sum(y, axis=1)/np.sum(r, axis=1)
    ynorm = np.sum(y, axis=1)*np.sum(r, axis=1) - ymean
    return ymean, ynorm

All the functions are developed to perform a recommendation. Now, let’s use these functions.

Recommend a Movie For Me

To recommend movies for me, I need to provide ratings for some moves.

my_ratings = np.zeros((1682,1))
my_ratings[5] = 5 
my_ratings[50] = 1
my_ratings[9] = 5
my_ratings[27]= 4
my_ratings[58] = 3
my_ratings[88]= 2
my_ratings[123]= 4
my_ratings[165] = 1
my_ratings[187]= 3
my_ratings[196] = 2
my_ratings[228]= 4
my_ratings[258] = 5 
my_ratings[343] = 4
my_ratings[478] = 1
my_ratings[511]= 4
my_ratings[690] = 5
my_ratings[722]= 1
my_ratings[789]= 3
my_ratings[832] = 2
my_ratings[1029]= 4
my_ratings[1190] = 2
my_ratings[1245]= 5

I will add my ratings in the ‘y’ DataFrame now.

y1 = np.hstack((my_ratings, y))

It is also required to add in the ‘r’ DataFrame. That is a little different. Because ‘r’ DataFrame only shows if I provided the rating for a movie or not. Then I will put 0 in the index where my_ratings list is zero and I will put 1 where my_ratings list has a value.

my_r = np.zeros((1682,1))
for i in range(len(r)):
    if my_ratings[i] !=0:
        my_r[i] = 1

Now add this my_r list to the ‘r’ DataFrame and name it r1.

r1 = np.hstack((my_r, r))

We need to find a normalized y value using the new y1 and r1.

ymean, ynorm = normalizeRatings(y1, r1)

Do you remember, I said in the beginning that we will use randomly initialized X and theta towards the end?

This is the time. Let’s try to recommend movies for me using randomly initialized theta and X.

num_users = y1.shape[1]
num_movies = y1.shape[0]
num_features = 10
X1= np.random.randn(num_movies, num_features)
Theta1 = np.random.randn(num_users, num_features)
x_up, theta_up, J_hist = gradientDescent(X1, y1, r1, Theta1, 10, 500,0.001)

So, we got the cost data for every iteration. Let’s plot it.

import matplotlib.pyplot as plt
plt.title("Cost function using Gradient Descent")

Look! After some terations cost function got stable! That means X and Theta values were stable at that point.

Recommend the Movies

Using the simple linear regression formula I mentioned in the beginning, let’s predict the ratings. Notice, I did not rate all the movies. So, here using the updated parameters we will predict the ratings for all the movies for all the users.

p = np.dot(x_up, theta_up.T)

Now, my ratings were in the first column as I added using np.hstack. Separate my ratings only and normalize it using the ‘ymean’ that came out from the ‘normalizeRatings’ function above.

my_predictions = p[:, 0] + ymean
my_predictions = pd.DataFrame(my_predictions)

Now, I will add these ratings to the movie list. First import the movie list.

movies = open('movie_ids.txt', 'r').read().split("\n")[:-1]
df = pd.DataFrame(np.hstack((my_predictions,np.array(movies)[:,np.newaxis])))

We have the movie list and my ratings side by side now in a DataFrame. If we just sort this DataFrame my_ratings, I will find the top-recommended movies for myself. Here I get the top 10 recommended movies according to my ratings:


These are the movies recommended for me as per the ratings I provided.


This algorithm works using the ratings of the users. It is possible to develop a similar recommendation system based on previous buying records, search records, or watch records. Hope this was helpful. Please do not hesitate to ask in the comment section if there is any question.

#artificialinteligence #datascience #machinelearning #programming #python #technology

Leave a Reply

Close Menu