Statinfer

204.1.1 Correlation in Python

Understanding correlation.

Correlation

What is need of correlation?

  • Is there any association between hours of study and grades?
  • Is there any association between number of temples in a city & murder rate?
  • What happens to sweater sales with increase in temperature? What is the strength of association between them?
  • What happens to ice-cream sales v.s temperature? What is the strength of association between them?
  • How to quantify the association?
  • Which of the above examples has very strong association?
  • Correlation

Correlation coefficient

  • It is a measure of linear association.
  • r is the ratio of variance together vs product of individual variances.
Correlationcoefficient(r)=CovarianceofXYSqrt(VarianceXVarianceY)
  • Correlation 0 No linear association
  • Correlation 0 to 0.25 Negligible positive association
  • Correlation 0.25-0.5 Weak positive association
  • Correlation 0.5-0.75 Moderate positive association
  • Correlation >0.75 Very Strong positive association

Practice : Correlation Calculation

  • Dataset: AirPassengers\AirPassengers.csv
  • Find the correlation between number of passengers and promotional budget.
  • Draw a scatter plot between number of passengers and promotional budget
  • Find the correlation between number of passengers and Service_Quality_Score
In [1]:
import pandas as pd
air = pd.read_csv("datasets\\AirPassengers\\AirPassengers.csv")
air.shape
Out[1]:
(80, 9)
In [2]:
air.columns.values
Out[2]:
array(['Week_num', 'Passengers', 'Promotion_Budget',
       'Service_Quality_Score', 'Holiday_week',
       'Delayed_Cancelled_flight_ind', 'Inter_metro_flight_ratio',
       'Bad_Weather_Ind', 'Technical_issues_ind'], dtype=object)
In [3]:
#Find the correlation between number of passengers and promotional budget.
import numpy as np
np.corrcoef(air.Passengers,air.Promotion_Budget)
Out[3]:
array([[ 1.        ,  0.96585103],
       [ 0.96585103,  1.        ]])
In [4]:
#Draw a scatter plot between number of passengers and promotional budget
import matplotlib.pyplot as plt
%matplotlib inline  
plt.scatter(air.Passengers, air.Promotion_Budget)
Out[4]:
<matplotlib.collections.PathCollection at 0x8feb8d0>
In [5]:
#Find the correlation between number of passengers and Service_Quality_Score
np.corrcoef(air.Passengers,air.Service_Quality_Score)
Out[5]:
array([[ 1.        , -0.88653002],
       [-0.88653002,  1.        ]])

Beyond Pearson Correlation

  • Correlation coefficient measures for different types of data
Variable Y\X Quantitative /Continuous X Ordinal/Ranked/Discrete X Nominal/Categorical X
Quantitative Y Pearson r Biserial rb Point Biserial rpb
Ordinal/Ranked/Discrete Y Biserial rb Spearman rho/Kendall’s Rank Biserial rrb
Nominal/Categorical Y Point Biserial rpb Rank Biserial rrb Phi, Contingency Coeff, V

The next post is about regression in python.

Link to the next post : https://statinfer.com/204-1-2-regression-in-python/

0 responses on "204.1.1 Correlation in Python"

Leave a Message

Blog Posts

Hurry up!!!

"use coupon code for FLAT 30% discount"  datascientistoffer        ___________________________________      Subscribe to our youtube channel. Get access to video tutorials.                

Contact Us

Statinfer Software Solutions#647 2nd floor 1st Main, Indira Nagar 1st Stage, 100 feet road,Indranagar Bangalore,Karnataka, Pin code:-560038 Landmarks: Opp. Namma Metro Pillar 48.

Connect with us

linkin fn twitter g

How to become a Data Scientist.?

top