Link to the previous post : https://statinfer.com/204-1-2-regression-in-python/
In the last post we went through concept of Regression. In this post we will try to implement and practice Linear regression.
Practice : Regression Line Fitting
- Dataset: AirPassengers\AirPassengers.csv
- Find the correlation between Promotion_Budget and Passengers
- Draw a scatter plot between Promotion_Budget and Passengers. Is there any any pattern between Promotion_Budget and Passengers?
- Build a linear regression model on Promotion_Budget and Passengers.
- Build a regression line to predict the passengers using Inter_metro_flight_ratio
In [6]:
import pandas as pd
air = pd.read_csv("datasets\\AirPassengers\\AirPassengers.csv")
air.shape
Out[6]:
In [7]:
air.columns.values
Out[7]:
In [8]:
air.head(5)
Out[8]:
In [9]:
# Find the correlation between Promotion_Budget and Passengers
import numpy as np
np.corrcoef(air.Passengers,air.Promotion_Budget)
Out[9]:
In [10]:
# Draw a scatter plot between Promotion_Budget and Passengers. Is there any any pattern between Promotion_Budget and Passengers?
import matplotlib.pyplot as plt
%matplotlib inline
plt.scatter(air.Passengers, air.Promotion_Budget)
Out[10]:
In [11]:
#Build a linear regression model and estimate the expected passengers for a Promotion_Budget is 650,000
##Regression Model promotion and passengers count
from sklearn.linear_model import LinearRegression
lr = LinearRegression()
lr.fit(air[["Promotion_Budget"]], air[["Passengers"]])
predictions = lr.predict(air[["Promotion_Budget"]])
In [12]:
import statsmodels.formula.api as sm
model = sm.ols(formula='Passengers ~ Promotion_Budget', data=air)
fitted1 = model.fit()
In [13]:
fitted1.summary()
Out[13]:
In [14]:
# Build a regression line to predict the passengers using Inter_metro_flight_ratio
plt.scatter(air.Inter_metro_flight_ratio,air.Passengers)
Out[14]:
In [15]:
import sklearn as sk
from sklearn.linear_model import LinearRegression
lr = LinearRegression()
lr.fit(air[["Inter_metro_flight_ratio"]], air[["Passengers"]])
Out[15]:
In [16]:
predictions = lr.predict(air[["Inter_metro_flight_ratio"]])
In [17]:
import statsmodels.formula.api as sm
model = sm.ols(formula='Passengers ~ Inter_metro_flight_ratio', data=air)
fitted2 = model.fit()
In [18]:
fitted2.summary()
Out[18]:
The next post is on how good is my regression line.
Link to the next post : https://statinfer.com/204-1-4-how-good-is-my-regression-line/