Machine Learning in Python for Dummies – Part1-Linear Regression

CASH ON DELIVERY available | Help/Support: +91-7709719214 info@rentorsa.com

Machine Learning in Python for Dummies – Part1-Linear Regression

Post author:Debashri, Founder of Rentorsa
Post published:May 8, 2019
Post category:Machine Learning
Post comments:0 Comments

I will use Python 3.7 for this whole tutorial series. The easiest way to start with python for machine learning is to install Anaconda. It will give you almost all the necessary bells and whistles required. Once you install the Anaconda, Jupyter notebook will automatically get installed. I am using the jupyter notebook for this tutorial.

We will use the housing price dataset for building linear regression prediction model. And then we will calculate the prediction accuracy of the built model.

Code explanation:

import numpy as np
import pandas as pd
rom sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split

I am importing all the necessary libraries above.

data=pd.read_csv('kc_house_data.csv')

Reading the whole dataset

data.head()

Getting the overview of the dataset

X=data[['bedrooms', 'bathrooms', 'sqft_living', 'sqft_lot', 'floors',
       'view', 'condition', 'grade', 'sqft_above',
       'sqft_basement', 'yr_built', 'yr_renovated', 'zipcode', 'lat', 'long',
       'sqft_living15', 'sqft_lot15']]

Choosing the necessary columns for input data (X).

y=data['price']

Choosing the output column (Y)

X_train, X_test, y_train, y_test=train_test_split(X,y,random_state=0)

Splitting the whole dataset into train and test data. The first part train, as the name suggested, will be used to train the regression model. And the second part will be used to check accouracy/prediction.

linreg=LinearRegression().fit(X_train,y_train)

This is the model, we are training with the train data.

acc=linreg.score(X_test,y_test)

We are calculating the accuracy of the model. In my case I am getting the accuracy value as 0.6817 or around 68%. Not great, but good enough for now.

Now, lets check what it predicts for some input data, say row number 4 of the input (X) of the test dataset.

   linreg.predict(X_test.iloc[[5]])

and I get output as 371119.19927172 against actual value of 29700. As I said, not great but good for now and we will see different ways we will gradually increase the accuracy in the subsequent tutorials.

Tags: AI, Data Science, Machine Learning

Debashri, Founder of Rentorsa

Hi, I am Debashree. Apart from managing this webshop, i am also passionate about cooking ( you will see the recipes in our youtube channel), singing, writing and teaching.