Statinfer

204.1.3 Practice : Regression Line Fitting

Link to the previous post : https://statinfer.com/204-1-2-regression-in-python/

In the last post we went through concept of Regression. In this post we will try to implement and practice Linear regression.

Practice : Regression Line Fitting

  • Dataset: AirPassengers\AirPassengers.csv
  • Find the correlation between Promotion_Budget and Passengers
  • Draw a scatter plot between Promotion_Budget and Passengers. Is there any any pattern between Promotion_Budget and Passengers?
  • Build a linear regression model on Promotion_Budget and Passengers.
  • Build a regression line to predict the passengers using Inter_metro_flight_ratio
In [6]:
import pandas as pd
air = pd.read_csv("datasets\\AirPassengers\\AirPassengers.csv")
air.shape
Out[6]:
(80, 9)
In [7]:
air.columns.values
Out[7]:
array(['Week_num', 'Passengers', 'Promotion_Budget',
       'Service_Quality_Score', 'Holiday_week',
       'Delayed_Cancelled_flight_ind', 'Inter_metro_flight_ratio',
       'Bad_Weather_Ind', 'Technical_issues_ind'], dtype=object)
In [8]:
air.head(5)
Out[8]:
Week_num Passengers Promotion_Budget Service_Quality_Score Holiday_week Delayed_Cancelled_flight_ind Inter_metro_flight_ratio Bad_Weather_Ind Technical_issues_ind
0 1 37824 517356 4.00000 NO NO 0.70 YES YES
1 2 43936 646086 2.67466 NO YES 0.80 YES YES
2 3 42896 638330 3.29473 NO NO 0.90 NO NO
3 4 35792 506492 3.85684 NO NO 0.40 NO NO
4 5 38624 609658 3.90757 NO NO 0.87 NO YES
In [9]:
# Find the correlation between Promotion_Budget and Passengers
import numpy as np
np.corrcoef(air.Passengers,air.Promotion_Budget)
Out[9]:
array([[ 1.        ,  0.96585103],
       [ 0.96585103,  1.        ]])
In [10]:
# Draw a scatter plot between   Promotion_Budget and Passengers. Is there any any pattern between Promotion_Budget and Passengers?

import matplotlib.pyplot as plt
%matplotlib inline 

plt.scatter(air.Passengers, air.Promotion_Budget)
Out[10]:
<matplotlib.collections.PathCollection at 0x90bda20>
In [11]:
#Build a linear regression model and estimate the expected passengers for a Promotion_Budget is 650,000
##Regression Model  promotion and passengers count

from sklearn.linear_model import LinearRegression
lr = LinearRegression()
lr.fit(air[["Promotion_Budget"]], air[["Passengers"]])
predictions = lr.predict(air[["Promotion_Budget"]])
In [12]:
import statsmodels.formula.api as sm
model = sm.ols(formula='Passengers ~ Promotion_Budget', data=air)
fitted1 = model.fit()
In [13]:
fitted1.summary()
Out[13]:
OLS Regression Results
Dep. Variable: Passengers R-squared: 0.933
Model: OLS Adj. R-squared: 0.932
Method: Least Squares F-statistic: 1084.
Date: Wed, 27 Jul 2016 Prob (F-statistic): 1.66e-47
Time: 11:48:26 Log-Likelihood: -751.34
No. Observations: 80 AIC: 1507.
Df Residuals: 78 BIC: 1511.
Df Model: 1
Covariance Type: nonrobust
coef std err t P>|t| [95.0% Conf. Int.]
Intercept 1259.6058 1361.071 0.925 0.358 -1450.078 3969.290
Promotion_Budget 0.0695 0.002 32.923 0.000 0.065 0.074
Omnibus: 26.624 Durbin-Watson: 1.831
Prob(Omnibus): 0.000 Jarque-Bera (JB): 5.188
Skew: -0.128 Prob(JB): 0.0747
Kurtosis: 1.779 Cond. No. 2.67e+06
In [14]:
# Build a regression line to predict the passengers using Inter_metro_flight_ratio

plt.scatter(air.Inter_metro_flight_ratio,air.Passengers)
Out[14]:
<matplotlib.collections.PathCollection at 0xb13f2b0>
In [15]:
import sklearn as sk

from sklearn.linear_model import LinearRegression
lr = LinearRegression()
lr.fit(air[["Inter_metro_flight_ratio"]], air[["Passengers"]])
Out[15]:
LinearRegression(copy_X=True, fit_intercept=True, n_jobs=1, normalize=False)
In [16]:
predictions = lr.predict(air[["Inter_metro_flight_ratio"]])
In [17]:
import statsmodels.formula.api as sm
model = sm.ols(formula='Passengers ~ Inter_metro_flight_ratio', data=air)
fitted2 = model.fit()
In [18]:
fitted2.summary()
Out[18]:
OLS Regression Results
Dep. Variable: Passengers R-squared: 0.242
Model: OLS Adj. R-squared: 0.232
Method: Least Squares F-statistic: 24.90
Date: Wed, 27 Jul 2016 Prob (F-statistic): 3.58e-06
Time: 11:48:27 Log-Likelihood: -848.30
No. Observations: 80 AIC: 1701.
Df Residuals: 78 BIC: 1705.
Df Model: 1
Covariance Type: nonrobust
coef std err t P>|t| [95.0% Conf. Int.]
Intercept 2.044e+04 4993.747 4.093 0.000 1.05e+04 3.04e+04
Inter_metro_flight_ratio 3.507e+04 7027.768 4.990 0.000 2.11e+04 4.91e+04
Omnibus: 10.172 Durbin-Watson: 1.385
Prob(Omnibus): 0.006 Jarque-Bera (JB): 10.098
Skew: 0.822 Prob(JB): 0.00641
Kurtosis: 3.573 Cond. No. 9.48

The next post is on how good is my regression line.

Link to the next post : https://statinfer.com/204-1-4-how-good-is-my-regression-line/

0 responses on "204.1.3 Practice : Regression Line Fitting"

Leave a Message

Blog Posts

Hurry up!!!

"use coupon code for FLAT 30% discount"  datascientistoffer        ___________________________________      Subscribe to our youtube channel. Get access to video tutorials.                

Contact Us

Statinfer Software Solutions#647 2nd floor 1st Main, Indira Nagar 1st Stage, 100 feet road,Indranagar Bangalore,Karnataka, Pin code:-560038 Landmarks: Opp. Namma Metro Pillar 48.

Connect with us

linkin fn twitter g

How to become a Data Scientist.?

top