Correlation
What is need of correlation?
- Is there any association between hours of study and grades?
- Is there any association between number of temples in a city & murder rate?
- What happens to sweater sales with increase in temperature? What is the strength of association between them?
- What happens to ice-cream sales v.s temperature? What is the strength of association between them?
- How to quantify the association?
- Which of the above examples has very strong association?
- Correlation
Correlation coefficient
- It is a measure of linear association.
- r is the ratio of variance together vs product of individual variances.
Correlationcoefficient(r)=CovarianceofXYSqrt(VarianceX∗VarianceY)
- Correlation 0 No linear association
- Correlation 0 to 0.25 Negligible positive association
- Correlation 0.25-0.5 Weak positive association
- Correlation 0.5-0.75 Moderate positive association
- Correlation >0.75 Very Strong positive association
Practice : Correlation Calculation
- Dataset: AirPassengers\AirPassengers.csv
- Find the correlation between number of passengers and promotional budget.
- Draw a scatter plot between number of passengers and promotional budget
- Find the correlation between number of passengers and Service_Quality_Score
In [1]:
import pandas as pd
air = pd.read_csv("datasets\\AirPassengers\\AirPassengers.csv")
air.shape
Out[1]:
In [2]:
air.columns.values
Out[2]:
In [3]:
#Find the correlation between number of passengers and promotional budget.
import numpy as np
np.corrcoef(air.Passengers,air.Promotion_Budget)
Out[3]:
In [4]:
#Draw a scatter plot between number of passengers and promotional budget
import matplotlib.pyplot as plt
%matplotlib inline
plt.scatter(air.Passengers, air.Promotion_Budget)
Out[4]:
In [5]:
#Find the correlation between number of passengers and Service_Quality_Score
np.corrcoef(air.Passengers,air.Service_Quality_Score)
Out[5]:
Beyond Pearson Correlation
- Correlation coefficient measures for different types of data
Variable Y\X | Quantitative /Continuous X | Ordinal/Ranked/Discrete X | Nominal/Categorical X |
Quantitative Y | Pearson r | Biserial rb | Point Biserial rpb |
Ordinal/Ranked/Discrete Y | Biserial rb | Spearman rho/Kendall’s | Rank Biserial rrb |
Nominal/Categorical Y | Point Biserial rpb | Rank Biserial rrb | Phi, Contingency Coeff, V |
The next post is about regression in python.
Link to the next post : https://statinfer.com/204-1-2-regression-in-python/