Polynomial regression in an improved version of linear regression. If you know linear regression, it will be simple for you. If not, I will explain the formulas here in this article. There are other advanced and more efficient machine learning algorithms are out there. But it is a good idea to learn linear based regression techniques. Because they are simple, fast, and works with very well known formulas. Though it may not work with a complex set of data.
Polynomial Regression Formula
Linear regression can perform well only if there is a linear correlation between the input variables and the output variable. So, the polynomial regression technique came out. It could find the relationship between input features and the output variable in a better way even if the relationship is not linear. It uses the same formula as the linear regression:
Y = BX + C
I am sure, we all learned this formula in school. For linear regression, we use symbols like this:

Here, we get X and Y from the dataset. X is the input feature and Y is the output variable. Theta values are initialized randomly.
For polynomial regression, the formula becomes like this:

We are adding more terms here. We are using the same input features and taking different exponentials to make more features. That way, our algorithm will be able to learn about the data better.
The powers do not have to be 2, 3, or 4. They could be 1/2, 1/3, or 1/4 as well. Then the formula will look like this:

Cost Function And Gradient Descent
Cost function gives an idea of how far the predicted hypothesis is from the values. The formula is:

This equation may look complicated. It is doing a simple calculation. First, deducting the hypothesis from the original output variable. Taking a square to eliminate the negative values. Then dividing that value by 2 times the number of training examples.
What is gradient descent? It helps in fine-tuning our randomly initialized theta values. I am not going to the differential calculus here. If you take the partial differential of the cost function on each theta, we can derive these formulas:

Here, alpha is the learning rate. You choose the value of alpha.
Python Implementation of Polynomial Regression
Here is the step by step implementation of Polynomial regression.
- We will use a simple dummy dataset for this example that gives the data of salaries for positions. Import the dataset:
import pandas as pd
import numpy as np
df = pd.read_csv('position_salaries.csv')
df.head()

2. Add the bias column for theta 0. This bias column will only contain 1. Because if you multiply 1 with a number it does not change.
df = pd.concat([pd.Series(1, index=df.index, name='00'), df], axis=1)
df.head()

3. Delete the ‘Position’ column. Because the ‘Position’ column contains strings and algorithms do not understand strings. We have the ‘Level’ column to represent the positions.
df = df.drop(columns='Position')
4. Define our input variable X and the output variable y. In this example, ‘Level’ is the input feature and ‘Salary’ is the output variable. We want to predict the salary for levels.
y = df['Salary']
X = df.drop(columns = 'Salary')
X.head()

5. Take the exponentials of the ‘Level’ column to make ‘Level1’ and ‘Level2’ columns.
X['Level1'] = X['Level']**2
X['Level2'] = X['Level']**3
X.head()

6. Now, normalize the data. Divide each column by the maximum value of that column. That way, we will get the values of each column ranging from 0 to 1. The algorithm should work even without normalization. But it helps to converge faster. Also, calculate the value of m which is the length of the dataset.
m = len(X)
X = X/X.max()
7. Define the hypothesis function. That will use the X and theta to predict the ‘y’.
def hypothesis(X, theta):
y1 = theta*X
return np.sum(y1, axis=1)
8. Define the cost function, with our formula for cost-function above:
def cost(X, y, theta):
y1 = hypothesis(X, theta)
return sum(np.sqrt((y1-y)**2))/(2*m)
9. Write the function for gradient descent. We will keep updating the theta values until we find our optimum cost. For each iteration, we will calculate the cost for future analysis.
def gradientDescent(X, y, theta, alpha, epoch):
J=[]
k=0
while k < epoch:
y1 = hypothesis(X, theta)
for c in range(0, len(X.columns)):
theta[c] = theta[c] - alpha*sum((y1-y)* X.iloc[:, c])/m
j = cost(X, y, theta)
J.append(j)
k += 1
return J, theta
10. All the functions are defined. Now, initialize the theta. I am initializing an array of zero. You can take any other random values. I am choosing alpha as 0.05 and I will iterate the theta values for 700 epochs.
theta = np.array([0.0]*len(X.columns))
J, theta = gradientDescent(X, y, theta, 0.05, 700)
11. We got our final theta values and the cost in each iteration as well. Let’s find the salary prediction using our final theta.
y_hat = hypothesis(X, theta)
12. Now plot the original salary and our predicted salary against the levels.
%matplotlib inline
import matplotlib.pyplot as plt
plt.figure()
plt.scatter(x=X['Level'],y= y)
plt.scatter(x=X['Level'], y=y_hat)
plt.show()

Our prediction does not exactly follow the trend of salary but it is close. Linear regression can only return a straight line. But in polynomial regression, we can get a curved line like that. If the line would not be a nice curve, polynomial regression can learn some more complex trends as well.
13. Let’s plot the cost we calculated in each epoch in our gradient descent function.
plt.figure()
plt.scatter(x=list(range(0, 700)), y=J)
plt.show()

The cost fell drastically in the beginning and then the fall was slow. In a good machine learning algorithm, cost should keep going down until the convergence. Please feel free to try it with a different number of epochs and different learning rates (alpha).
Here is the dataset: salary_data
Follow this link for the full working code: Polynomial Regression
I hope this was helpful.