# Hands-on Linear Regression Using Sklearn

In this tutorial, we'll look at how to use scikit-learn to predict using a Linear Regression model.

### Problem Statement

The link to the data set is https://www.kaggle.com/c/boston-housing. We can also import this dataset from the scikit-learn itself.

• The Boston Housing dataset contains information about various houses in Boston through different parameters.
• There are 506 samples and 13 feature variables in this dataset.

The objective is to predict the value of prices of the house using the given features.

Let's get started with the hands-on exercise.

### 1. Importing Libraries

`%matplotlib inline`
`import numpy as np`
`import pandas as pd`
`import matplotlib.pyplot as plt`
`import seaborn as sns`
`from sklearn.preprocessing import StandardScaler`
`from sklearn.model_selection import train_test_split`
`from sklearn.linear_model import LinearRegression`

### Code

`import load_boston`
`boston = load_boston()`
`print("Shape of boston data: ",boston.data.shape)`
`print(boston.feature_names)`

### Output

There are 4 keys in the bunch [‘data’, ‘target’, ‘feature_names’, ‘DESCR’] as mentioned above. The data has 506 rows and 13 feature variables. Notice that this doesn’t include the target variable. Also, the names of the columns are also extracted. The details about the features and more information about the dataset can be seen by using boston.DESCR`

### Code

`print(boston.DESCR)`

### Output

We must convert this to a pandas data frame before applying any EDA or model, which we can do by calling the dataframe on Boston.data. We also add the target variable from boston.target to the dataframe.

### Code

`bos = pd.DataFrame(boston.data)bos['PRICE'] = boston.target`
`bos.head()`

### 3. Train and Test Split of Data

1. We will split the data into 2 parts, ie we will be 80% of the data to build the model and the remaining 20% will be kept as unseen as validation for model generalization.
2. We will perform standardization on all the input features to the same scale. You can refer to the concepts of standardization and normalization in the Probability and Statistics module.

### Code

`X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size = 0.33, random_state = 42)`
`sc = StandardScaler()`
`X_train = sc.fit_transform(X_train)`
`X_test = sc.transform(X_test)`
`print(X_train.shape)`
`print(X_test.shape)`
`print(Y_train.shape)`
`print(Y_test.shape)`

### Code

`# loading the model`
`lin_reg_model = LinearRegression()`
`# fitting the model with train data`
`lin_reg_model.fit(X_train, Y_train)`
`# predicting on the test 20% data`
`Y_pred = model.predict(X_test)`
`# weights and intercept of the model features`
`optimal_W = model.coef_ `
`optimal_b = model.intercept_`
`print("Optimal W: ",optimal_W)`
`print("Optimal intercept(bias): ",np.round(optimal_b,3))`

### Output

Let us evaluate the various metrics we discussed during linear regression.

### Code

`# error and evaluation metrics of the model`
`error = Y_test - Y_pred`
`MSE = (1/X_test.shape[0]) * np.sum(error**2)`
`RMSE = np.sqrt(MSE)`
`print("MSE: ",np.round(sq_loss,3))`
`print("RMSE: ",np.round(rmse,3))`

### 4. Visulatizations

4.1 Plotting the model fitted line on the output variable.

`plt.figure(figsize = (20,6))`
`plt.style.use('fivethirtyeight')`
`plt.subplot(121)`
`plt.plot(Y_pred_sklearn,Y_test,'ro')`
`plt.xlabel("Actual Price")`
`plt.ylabel("Predicted Price")`

Output

4.2 Plotting the distribution of house prices

`plt.figure(figsize = (20,6))`
`plt.style.use('fivethirtyeight')`
`plt.subplot(121)`
`sns.kdeplot(Y_pred_sklearn, bw = 0.5, color = "r", shade = True)`
`plt.xlabel("Predicted Price")`
`plt.ylabel("Distribution")`
`plt.title("With Sklearn")`

Output

4.3 Plotting the error distribution

`plt.figure(figsize = (20,6))`
`plt.style.use('fivethirtyeight')`
`plt.subplot(121)`
`sns.kdeplot(np.array(error_sklearn), bw = 0.5, color = "r", shade = True)`
`plt.xlabel("Error = Actual - Predicted")`
`plt.ylabel("Error Distribution")`
`plt.title("With Sklearn")`

Output

### 5. Conclusion

We can see how our model is predicting by plotting a scatter plot between the original house price and predicted house prices. I Hope, it was fun with the first hands-on tutorial to build a machine learning model. To tweak and understand it better you can also try different models on the same problem, with that you would not only get better results but also a better understanding of the same.

Bengaluru, India