Machine Learning in Python for Dummies – Part2-Polynomial Regression

In the part-1 of this machine learning with python tutorial series, we used linear regression model and saw an accuracy of 68%.

In this part we will use polynomial regression. Polynomial regression is special case of linear regression where we introduce additional input features terms by putting the polynomial terms of the existing inputs.

Let’s do it in python:

# starting with Importing all the neccessory functions from Scikit Learn
import numpy as np
import pandas as pd
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import PolynomialFeatures 

#importing the dataset from the CSV file
data=pd.read_csv('kc_house_data.csv')

#Selecting only the required columns from the dataset to be used as input (X)

X=data[['bedrooms', 'bathrooms', 'sqft_living', 'sqft_lot', 'floors',
       'view', 'condition', 'grade', 'sqft_above',
       'sqft_basement', 'yr_built', 'yr_renovated', 'zipcode', 'lat', 'long',
       'sqft_living15', 'sqft_lot15']]


#Let's see a snapshot of our input (X) data
X.head()
	bedrooms	bathrooms	sqft_living	sqft_lot	floors	view	condition	grade	sqft_above	sqft_basement	yr_built	yr_renovated	zipcode	lat	long	sqft_living15	sqft_lot15
0	3	1.00	1180	5650	1.0	0	3	7	1180	0	1955	0	98178	47.5112	-122.257	1340	5650
1	3	2.25	2570	7242	2.0	0	3	7	2170	400	1951	1991	98125	47.7210	-122.319	1690	7639
2	2	1.00	770	10000	1.0	0	3	6	770	0	1933	0	98028	47.7379	-122.233	2720	8062
3	4	3.00	1960	5000	1.0	0	5	7	1050	910	1965	0	98136	47.5208	-122.393	1360	5000
4	3	2.00	1680	8080	1.0	0	3	8	1680	0	1987	0	98074	47.6168	-122.045	1800	7503
#Choosing output column
y=data['price']

#Creating a second order polynomial feature
poly = PolynomialFeatures(degree=2)

#Converting our input linear dataset to the polynomial dataset 
X_poly = poly.fit_transform(X)

#Splitting the input data into test and train dataset
X_train, X_test, y_train, y_test=train_test_split(X_poly,y,random_state=0)

#Fitting the train data
linreg_poly=LinearRegression().fit(X_train,y_train)

#calculating the accuracy of our model against the test data
acc_p=linreg_poly.score(X_test,y_test)


acc_p
0.8030208254343177

We got an accuracy of 80% against 68% we got using linear regression.

#let's predict the 6th row value of the y_test through our model by inputting the equivalent row of X_test
linreg_poly.predict(X_test[[5]])


#and what is the actual value of that location
y_test.iloc[[5]]

16227    485000.0
Name: price, dtype: float64

We saw how the accuracy has increased by using polynomial regression. In the next tutorial we will the application of some other types of regression algorithm and some plotting applications.

Till then, Happy Learning!

Debashri, Founder of Rentorsa

Hi, I am Debashree. Apart from managing this webshop, i am also passionate about cooking ( you will see the recipes in our youtube channel), singing, writing and teaching.

Leave a Reply