## Dataset

I am using the Boston dataset that is already there in the scikit-learn library. It contains information about the housing price in Boston. First, import the necessary packages and the dataset.

`import numpy as npimport pandas as pdimport matplotlib.pyplot as pltimport seaborn as sns%matplotlib inlinefrom sklearn.datasets import load_bostonboston_data = load_boston()`
`df = pd.DataFrame(data=boston_data.data, columns=boston_data.feature_names)df["prices"] = boston_data.target`
`df.shape#Output:(506, 14)`
`df.columns#Output:Index(['CRIM', 'ZN', 'INDUS', 'CHAS', 'NOX', 'RM', 'AGE', 'DIS', 'RAD', 'TAX', 'PTRATIO', 'B', 'LSTAT', 'prices'], dtype='object')`
`df.isnull().sum()`

## Dependent and independent variables

In this dataset, we need to figure out the dependent variable and the independent variables. If you notice in the dataset, people might be interested to predict the housing prices based on the other features. Because they do not want to pay higher prices than fair market value. Just by the experience, we can expect that housing prices may differ based on the other features in the dataset. So, in this dataset, the housing prices are the dependent variable.

## Exploratory Analysis

Start with the distribution of the target variable or dependent variable:

`sns.set(rc={'figure.figsize': (12, 8)})sns.distplot(df["prices"], bins=25)plt.show()`
`correlation_matrix = df.corr().round(2)sns.set(rc={'figure.figsize':(11, 8)})sns.heatmap(data=correlation_matrix, annot=True)plt.show()`

`plt.figure(figsize=(20, 5))features = ['RM', 'DIS']target = df['prices']for i, col in enumerate(features):    plt.subplot(1, len(features) , i+1)    x = df[col]    y = target    plt.scatter(x, y, marker='o')    plt.title(col)    plt.xlabel(col)    plt.ylabel('prices')`
`plt.figure(figsize=(20, 5))df['AgeN'] = df['AGE']**3df['DisN'] = np.log(df['DIS'])features = ['AgeN', 'DisN']target = df['prices']for i, col in enumerate(features):    plt.subplot(1, len(features) , i+1)    x = df[col]    y = target    plt.scatter(x, y, marker='o')    plt.title(col)    plt.xlabel(col)    plt.ylabel('prices')`

#ExploratoryDataAnalysis #DataScience #DataAnalysis #pandas #python #matplotlib #seaborn