In the part-1 of this machine learning with python tutorial series, we used linear regression model and saw an accuracy of 68%.
In this part we will use polynomial regression. Polynomial regression is special case of linear regression where we introduce additional input features terms by putting the polynomial terms of the existing inputs.
Let’s do it in python:
# starting with Importing all the neccessory functions from Scikit Learn
import numpy as np
import pandas as pd
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import PolynomialFeatures
#importing the dataset from the CSV file
data=pd.read_csv('kc_house_data.csv')
#Selecting only the required columns from the dataset to be used as input (X)
X=data[['bedrooms', 'bathrooms', 'sqft_living', 'sqft_lot', 'floors',
'view', 'condition', 'grade', 'sqft_above',
'sqft_basement', 'yr_built', 'yr_renovated', 'zipcode', 'lat', 'long',
'sqft_living15', 'sqft_lot15']]
#Let's see a snapshot of our input (X) data
X.head()
bedrooms bathrooms sqft_living sqft_lot floors view condition grade sqft_above sqft_basement yr_built yr_renovated zipcode lat long sqft_living15 sqft_lot15
0 3 1.00 1180 5650 1.0 0 3 7 1180 0 1955 0 98178 47.5112 -122.257 1340 5650
1 3 2.25 2570 7242 2.0 0 3 7 2170 400 1951 1991 98125 47.7210 -122.319 1690 7639
2 2 1.00 770 10000 1.0 0 3 6 770 0 1933 0 98028 47.7379 -122.233 2720 8062
3 4 3.00 1960 5000 1.0 0 5 7 1050 910 1965 0 98136 47.5208 -122.393 1360 5000
4 3 2.00 1680 8080 1.0 0 3 8 1680 0 1987 0 98074 47.6168 -122.045 1800 7503
#Choosing output column
y=data['price']
#Creating a second order polynomial feature
poly = PolynomialFeatures(degree=2)
#Converting our input linear dataset to the polynomial dataset
X_poly = poly.fit_transform(X)
#Splitting the input data into test and train dataset
X_train, X_test, y_train, y_test=train_test_split(X_poly,y,random_state=0)
#Fitting the train data
linreg_poly=LinearRegression().fit(X_train,y_train)
#calculating the accuracy of our model against the test data
acc_p=linreg_poly.score(X_test,y_test)
acc_p
0.8030208254343177
We got an accuracy of 80% against 68% we got using linear regression.
#let's predict the 6th row value of the y_test through our model by inputting the equivalent row of X_test
linreg_poly.predict(X_test[[5]])
#and what is the actual value of that location
y_test.iloc[[5]]
16227 485000.0
Name: price, dtype: float64
We saw how the accuracy has increased by using polynomial regression. In the next tutorial we will the application of some other types of regression algorithm and some plotting applications.
Till then, Happy Learning!