In this tutorial, we'll look at how to use scikit-learn to predict using a Linear Regression model.

The link to the data set is https://www.kaggle.com/c/boston-housing. We can also import this dataset from the scikit-learn itself.

- The Boston Housing dataset contains information about various houses in Boston through different parameters.
- There are 506 samples and 13 feature variables in this dataset.

The objective is to predict the value of prices of the house using the given features.

Let's get started with the hands-on exercise.

`%matplotlib inline`

`import numpy as np`

`import pandas as pd`

`import matplotlib.pyplot as plt`

`import seaborn as sns`

`from sklearn.preprocessing import StandardScaler`

`from sklearn.model_selection import train_test_split`

`from sklearn.linear_model import LinearRegression`

`import load_boston`

`boston = load_boston()`

`print("Shape of boston data: ",boston.data.shape)`

`print(boston.feature_names)`

There are 4 keys in the bunch [‘data’, ‘target’, ‘feature_names’, ‘DESCR’] as mentioned above. The data has 506 rows and 13 feature variables. Notice that this doesn’t include the target variable. Also, the names of the columns are also extracted. The details about the features and more information about the dataset can be seen by using boston.DESCR`

`print(boston.DESCR)`

We must convert this to a pandas data frame before applying any EDA or model, which we can do by calling the dataframe on Boston.data. We also add the target variable from boston.target to the dataframe.

`bos = pd.DataFrame(boston.data)bos['PRICE'] = boston.target`

`bos.head()`

- We will split the data into 2 parts, ie we will be 80% of the data to build the model and the remaining 20% will be kept as unseen as validation for model generalization.
- We will perform standardization on all the input features to the same scale. You can refer to the concepts of standardization and normalization in the Probability and Statistics module.

`X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size = 0.33, random_state = 42)`

`sc = StandardScaler()`

`X_train = sc.fit_transform(X_train)`

`X_test = sc.transform(X_test)`

`print(X_train.shape)`

`print(X_test.shape)`

`print(Y_train.shape)`

`print(Y_test.shape)`

`# loading the model`

`lin_reg_model = LinearRegression()`

`# fitting the model with train data`

`lin_reg_model.fit(X_train, Y_train)`

`# predicting on the test 20% data`

`Y_pred = model.predict(X_test)`

`# weights and intercept of the model features`

`optimal_W = model.coef_ `

`optimal_b = model.intercept_`

`print("Optimal W: ",optimal_W)`

`print("Optimal intercept(bias): ",np.round(optimal_b,3))`

Let us evaluate the various metrics we discussed during linear regression.

`# error and evaluation metrics of the model`

`error = Y_test - Y_pred`

`MSE = (1/X_test.shape[0]) * np.sum(error**2)`

`RMSE = np.sqrt(MSE)`

`print("MSE: ",np.round(sq_loss,3))`

`print("RMSE: ",np.round(rmse,3))`

**4.1 ****Plotting the model fitted line on the output variable.**** **

plt.figure(figsize = (20,6))

plt.style.use('fivethirtyeight')

plt.subplot(121)

plt.plot(Y_pred_sklearn,Y_test,'ro')

plt.xlabel("Actual Price")

plt.ylabel("Predicted Price")

**Output**

**4.2 Plotting the distribution of house prices**

`plt.figure(figsize = (20,6))`

`plt.style.use('fivethirtyeight')`

`plt.subplot(121)`

`sns.kdeplot(Y_pred_sklearn, bw = 0.5, color = "r", shade = True)`

`plt.xlabel("Predicted Price")`

`plt.ylabel("Distribution")`

`plt.title("With Sklearn")`

**Output**

**4.3 Plotting the error distribution**

`plt.figure(figsize = (20,6))`

`plt.style.use('fivethirtyeight')`

`plt.subplot(121)`

`sns.kdeplot(np.array(error_sklearn), bw = 0.5, color = "r", shade = True)`

`plt.xlabel("Error = Actual - Predicted")`

`plt.ylabel("Error Distribution")`

`plt.title("With Sklearn")`

**Output**** **

We can see how our model is predicting by plotting a scatter plot between the original house price and predicted house prices. I Hope, it was fun with the first hands-on tutorial to build a machine learning model. To tweak and understand it better you can also try different models on the same problem, with that you would not only get better results but also a better understanding of the same.

Courses | Blogs | Cheat Sheet | News Letter | About Us | Login | Contact | Privacy policy | Cookie policy

© Padhai Time 2022 | All Rights Reserved

We collect cookies and may share with 3rd party vendors for analytics, advertising and to enhance your experience. You can read more about our cookie policy by clicking on the 'Learn More' Button. By Clicking 'Accept', you agree to use our cookie technology.

Our Privacy policy can be found by clicking here