 Linear Regression in Python

## Complete Detailed Tutorial on Linear Regression in Python

```import seaborn as sns
iris```
`iris = iris[['petal_length', 'petal_width']]`
```X = iris['petal_length']
y = iris['petal_width']```
```import matplotlib.pyplot as plt
plt.scatter(X, y)
plt.xlabel("petal length")
plt.ylabel("petal width")```
```from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.4, random_state = 23)```
`X_train`
```77     5.0
29     1.6
92     4.0
23     1.7
128    5.6
...
39     1.5
91     4.6
31     1.5
40     1.3
83     5.1
Name: petal_length, Length: 90, dtype: float64```
```import numpy as np
X_train = np.array(X_train).reshape(-1, 1)
X_train```
```array([[5. ],
[1.6],
[4. ],
[1.7],
[5.6],
[4. ],
[4.8],
[5.6],
[5.1],
[4.9],
[1.4],
[1.6],
[5.6],
[1.4],
[1.6],
[5.5],
[5.1],
[4. ],
[1.4],```
```X_test = np.array(X_test).reshape(-1, 1)
X_test```
```array([[5.4],
[6. ],
[4.1],
[1.5],
[5. ],
[4.9],
[1.7],
[5.5],
[1.7],
[3.6],
[4.7],
[1.6],
[5.9],
[1.5],
[1.5],
[5.1],
[4.5],
[4.7],
[6.1],
[1.4],
[5.3],
[1.4],
[1.6],
[1.3],
[5.6],```
`from sklearn.linear_model import LinearRegression`
`lr = LinearRegression()`
`lr.fit(X_train, y_train)`
```c = lr.intercept_
c```
`-0.3511327422143744`
```m = lr.coef_
m```
`array([0.41684538])`
```Y_pred_train = m*X_train + c
Y_pred_train.flatten()```
```array([1.73309416, 0.31581987, 1.31624878, 0.3575044 , 1.98320139,
1.31624878, 1.64972508, 1.98320139, 1.7747787 , 1.69140962,
0.23245079, 0.31581987, 1.98320139, 0.23245079, 0.31581987,
1.94151685, 1.7747787 , 1.31624878, 0.23245079, 1.35793332,
1.85814777, 1.52467147, 2.06657046, 2.40004677, 1.44130239,
0.19076625, 1.31624878, 1.69140962, 1.69140962, 1.31624878,
0.27413533, 1.52467147, 1.52467147, 1.27456424, 1.73309416,
1.64972508, 1.2328797 , 1.7747787 , 2.27499315, 2.19162408,
0.14908171, 2.02488593, 0.8994034 , 0.27413533, 2.108255  ,
1.64972508, 0.23245079, 1.52467147, 1.39961786, 1.81646324,
0.19076625, 0.06571264, 1.10782609, 0.10739718, 1.60804055,
1.39961786, 0.14908171, 2.06657046, 1.44130239, 1.52467147,
0.31581987, 2.52510038, 1.56635601, 1.7747787 , 1.98320139,
1.60804055, 0.27413533, 0.31581987, 1.94151685, 2.06657046,
1.48298693, 0.19076625, 1.81646324, 1.02445701, 2.02488593,
1.10782609, 0.19076625, 0.27413533, 0.27413533, 1.7747787 ,
0.23245079, 0.23245079, 1.69140962, 0.23245079, 1.48298693,
0.27413533, 1.56635601, 0.27413533, 0.19076625, 1.7747787 ])```
```y_pred_train1 = lr.predict(X_train)
y_pred_train1```
```array([1.73309416, 0.31581987, 1.31624878, 0.3575044 , 1.98320139,
1.31624878, 1.64972508, 1.98320139, 1.7747787 , 1.69140962,
0.23245079, 0.31581987, 1.98320139, 0.23245079, 0.31581987,
1.94151685, 1.7747787 , 1.31624878, 0.23245079, 1.35793332,
1.85814777, 1.52467147, 2.06657046, 2.40004677, 1.44130239,
0.19076625, 1.31624878, 1.69140962, 1.69140962, 1.31624878,
0.27413533, 1.52467147, 1.52467147, 1.27456424, 1.73309416,
1.64972508, 1.2328797 , 1.7747787 , 2.27499315, 2.19162408,
0.14908171, 2.02488593, 0.8994034 , 0.27413533, 2.108255  ,
1.64972508, 0.23245079, 1.52467147, 1.39961786, 1.81646324,
0.19076625, 0.06571264, 1.10782609, 0.10739718, 1.60804055,
1.39961786, 0.14908171, 2.06657046, 1.44130239, 1.52467147,
0.31581987, 2.52510038, 1.56635601, 1.7747787 , 1.98320139,
1.60804055, 0.27413533, 0.31581987, 1.94151685, 2.06657046,
1.48298693, 0.19076625, 1.81646324, 1.02445701, 2.02488593,
1.10782609, 0.19076625, 0.27413533, 0.27413533, 1.7747787 ,
0.23245079, 0.23245079, 1.69140962, 0.23245079, 1.48298693,
0.27413533, 1.56635601, 0.27413533, 0.19076625, 1.7747787 ])```
```import matplotlib.pyplot as plt
plt.scatter(X_train, y_train)
plt.plot(X_train, y_pred_train1, color ='red')
plt.xlabel("petal length")
plt.ylabel("petal width")```

```y_pred_test1 = lr.predict(X_test)
y_pred_test1```
```array([1.89983231, 2.14993954, 1.35793332, 0.27413533, 1.73309416,
1.69140962, 0.3575044 , 1.94151685, 0.3575044 , 1.14951063,
1.60804055, 0.31581987, 2.108255  , 0.27413533, 0.27413533,
1.7747787 , 1.52467147, 1.60804055, 2.19162408, 0.23245079,
1.85814777, 0.23245079, 0.31581987, 0.19076625, 1.98320139,
0.23245079, 0.44087348, 1.64972508, 1.48298693, 1.27456424,
0.27413533, 1.27456424, 0.19076625, 2.44173131, 0.27413533,
0.3575044 , 1.56635601, 1.02445701, 1.39961786, 2.14993954,
2.02488593, 0.44087348, 1.19119517, 0.23245079, 1.48298693,
1.73309416, 1.52467147, 2.31667769, 0.27413533, 1.35793332,
2.19162408, 1.89983231, 0.23245079, 1.98320139, 1.52467147,
1.60804055, 2.44173131, 1.39961786, 0.23245079, 1.7747787 ])```
```import matplotlib.pyplot as plt
plt.scatter(X_test, y_test)
plt.plot(X_test, y_pred_test1, color ='red')
plt.xlabel("petal length")
plt.ylabel("petal width")```

In this dataset, we have a total of 7 columns. Let’s see the dataset first:

```import pandas as pd
df```

```df['sex']  =df['sex'].astype('category')
df['sex'] = df['sex'].cat.codesdf['smoker']  =df['smoker'].astype('category')
df['smoker'] = df['smoker'].cat.codesdf['region']  =df['region'].astype('category')
df['region'] = df['region'].cat.codes```
`df.isnull().sum()`
```age         0
sex         0
bmi         0
children    0
smoker      0
region      0
charges     0
dtype: int64```
```X = df.drop(columns = 'charges')
X```
`y = df['charges']`
```from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.3, random_state = 23)```
```lr_multiple = LinearRegression()
lr_multiple.fit(X_train, y_train)```
```c = lr_multiple.intercept_
c```
`-11827.733141795668`
```m = lr_multiple.coef_
m```
`array([  256.5772619 ,   -49.39232379,   329.02381564,   479.08499828, 23400.28378787,  -276.31576201])`
```y_pred_train = lr_multiple.predict(X_train)
y_pred_test = lr_multiple.predict(X_test)```
```from sklearn.metrics import r2_score
r2_score(y_test, y_pred_test)```
`0.7911113876316933`