Correlation does not imply causation

There is a well known principle in statistics that correlation does not imply causation. It means that even if we observe that two variables behave in the same way, we should not conclude that the behavior of one of those variables is the cause (or is related) to the other.

In statistics and data mining, we can calculate the correlation between two variables or time series to see if they are correlated. The range of values for the correlation is usually [-1,1] where -1 indicates a negative correlation (two variables that behave in opposite ways, 0 indicates no correlation, and 1 indicates a positive correlation. Two variables that have a high correlation may be related. But if two variables have a high correlation but are not related, they are called a spurious correlations.

To be convinced of the principle that correlation does not imply causation, I will share a few examples from a very good website on this topic ( http://tylervigen.com/ ), that lists thousands of spurious correlations.

spurious correlation
Correlation of 0.78
spurious correlation 2
Correlation of 0.66
spurious correlation 3
Correlation of 0.99
spurious correlation of time series

Obviously, these correlations are totally spurious although the variables show very similar behavior. This shows the needs to always look further than just using a correlation measure.

Those are just a few example of spurious correlations. If you try the website, you can also browse various variables to find other spurious correlations.

Conclusion

In this short blog post, I shown a few examples of spurious correlations at I think it is quite interesting. If you have comments, please share them in the comments section below.


Philippe Fournier-Viger is a full professor working in China and founder of the SPMF open source data mining software.

(Visited 79 times, 1 visits today)

Comments

Correlation does not imply causation — 2 Comments

  1. so if cannot at least explain the correlation, meaning that it does not make sense in the application?

    • Yes, as shown in this blog post, two time series may be correlated but it does not mean that they are influencing each other or are even related. If two time series are correlated, it just mean that we have observed that they vary in the same way, but it can be just by chance. So we always need to be careful about how we interepret the result of correlation. We need to think if it really make sense or not or it is just by chance…

Leave a Reply

Your email address will not be published. Required fields are marked *