In the previous section, we studied about Practice : Correlation Calculation in R
The correlation coefficient used previously was the Pearson correlation coefficient, called so since it was invented by Pearson. If the correlation is between X and Y and if both X and Y are continuous, then there the Pearson coefficient works well. But there are places where it doesn’t work.
How to find the correlation between an indicator variable and a continuous variable? How to quantify the association between two indicator variables? How to quantify the association between two categorical variables? Pearson correlation coefficient fails if we tried to find the correlation between delayed fight indicator and the number of passengers because delayed flight indicator is an indicator variable and the number of passengers is a continuous variable. Pearson correlation coefficient fails between two indicator variables such as bad weather and technical issue as none of those variables are a continuous variable. So, for these, some special correlation coefficient is to be used, depending on the type of the data.
If one of the variables is ordinal/ranked/discrete, and another one is continuous/quantitative then we can use biserial correlation coefficient.
If one of the variables is nominal/categorical, and another one is also nominal/categorical then we can use phi, contingency correlation coefficient.
There is various kind of correlation coefficient, but the most widely used among them is the Pearson correlation coefficient. Depending upon the type of the problem we have to choose the correlation coefficient. R has all the packages support for calculating the various kinds of correlation coefficients. You can refer to this table for better understanding about choosing the right correlation measure for the different type of variables.
|Quantitative /Continuous X
|Phi, Contingency Coeff
The next post is about Introduction to Regression.