Pearson’s correlation coefficient is the test that measures the statistical linear relationship or association between two variables.
Consider an example of a company with over 100 employees that wants to answer the following question:
Are employees with more years of experience more likely to get paid more?
data:image/s3,"s3://crabby-images/878b0/878b02455bc07e61b29c4208c8d0b9dfc08a27eb" alt=""
data:image/s3,"s3://crabby-images/878b0/878b02455bc07e61b29c4208c8d0b9dfc08a27eb" alt=""
We can see in the above scatter plot, as the years of experience increase, the employee’s salary increases.
So, we can say there is a positive correlation between years of experience and salary.
Values of Pearson’s correlation:
Pearson’s ‘r’ value ranges from -1 to +1.
– When r is greater than 0, there is a positive relationship between the two variables, that is when a variable increases the other variables increases too.
– When r is less than 0, we can say there is a negative relationship between the two variables where a decrease in the value of a variable increases the value of the another.
– Value of 0 indicates that there is no relationship between the variables.
The closer r value to +1 or -1 the stronger the linear relationship is.
data:image/s3,"s3://crabby-images/662b2/662b271cf52d4aa35441eb3dcf59eac750141f3c" alt=""
data:image/s3,"s3://crabby-images/662b2/662b271cf52d4aa35441eb3dcf59eac750141f3c" alt=""
Code:
Calculating Pearson’s r value can be easily done in python as follows:
from scipy.stats import pearsonr
pearson_r, p-value = pearsonr(experience_years, salary)
If the p-value is smaller than the significance level (α=0.05), we can conclude that the correlation is statistically significant.