203.1.3 Beyond Pearson Correlation

Machine Learning with R - Pearson Correlation

In the previous section, we studied about Practice : Correlation Calculation in R

The correlation coefficient used previously was the Pearson correlation coefficient, called so since it was invented by Pearson. If the correlation is between X and Y and if both X and Y are continuous, then there the Pearson coefficient works well. But there are places where it doesn’t work.

How to find the correlation between an indicator variable and a continuous variable? How to quantify the association between two indicator variables? How to quantify the association between two categorical variables? Pearson correlation coefficient fails if we tried to find the correlation between delayed fight indicator and the number of passengers because delayed flight indicator is an indicator variable and the number of passengers is a continuous variable. Pearson correlation coefficient fails between two indicator variables such as bad weather and technical issue as none of those variables are a continuous variable.  So, for these, some special correlation coefficient is to be used, depending on the type of the data.

If one of the variables is ordinal/ranked/discrete, and another one is continuous/quantitative then we can use biserial correlation coefficient.

If one of the variables is nominal/categorical, and another one is also nominal/categorical then we can use phi, contingency correlation coefficient.

There is various kind of correlation coefficient, but the most widely used among them is the Pearson correlation coefficient. Depending upon the type of the problem we have to choose the correlation coefficient. R has all the packages support for calculating the various kinds of correlation coefficients. You can refer to this table for better understanding about choosing the right correlation measure for the different type of variables.

Variable Y Quantitative /Continuous X Ordinal/Ranked/Discrete X Nominal/Categorical X
Quantitative Y Pearson Biserial Point Biserial
Ordinal/Ranked/Discrete Y Biserial Spearman rho/Kendall’s Rank Biserial
Nominal/Categorical Y Point Biserial Rank Biserial Phi, Contingency Coeff


The next post is about Introduction to Regression.

27th January 2017

0 responses on "203.1.3 Beyond Pearson Correlation"

Leave a Message

Blog Posts

Hurry up!!!

"use coupon code for FLAT 30% discount"  datascientistoffer        ___________________________________      Subscribe to our youtube channel. Get access to video tutorials.                

Contact Us

Statinfer Software Solutions#647 2nd floor 1st Main, Indira Nagar 1st Stage, 100 feet road,Indranagar Bangalore,Karnataka, Pin code:-560038 Landmarks: Opp. Namma Metro Pillar 48.

Connect with us

linkin fn twitter g

How to become a Data Scientist.?