When it comes to machine learning, an important topic to be discussed is Regression. Once we understand the concepts, then building the models and further improvements will be much easier. To start with, we consider an important topic – Correlation
Is there any association between the number of hours one studies and the marks scored?
Is there any relation between the number of temples/churches in the city and the frequency of community riots?
What happens to the sweater sales increase in winter and ice cream sale increase in summer? Or the sweater sales in summer and ice cream sale in winter?’
In all the above cases we know there is some association between the conditions. Also, the association strength varies in each of the above conditions. We need to quantify the associations. So to quantify the association, we use a measure called correlation, so correlation simply quantifies the association.
Correlation is a measure of linear association between two variables as, if one decreases or increases, what happens to the other.
The correlation coefficient ‘r’ is the ratio of variance together, to the of product of separate standard deviations.
Generally, correlation takes the values between -1 to +1.
r= (covariance of XY ) ÷ ( Sqrt(varianceX * varianceY)
So, to understand the correlation we will just do a small exercise, we will take air passenger ‘s data then we will see what is the correlation between them. This will be followed in the next part of this session.
In next section, we will be studying about practice session on Correlation Calculation in R.
Practice : Correlation Calculation in R