Friday, October 08, 2010

Correlation vs Causality

I was in Chennai a few years ago at a conference sitting atop the terrace of a restaurant with a few colleagues. The sales of cold drinks were at an all time high with everybody ordering coke ,beer etc.
At the same time, I noticed that there was a huge influx of patients at the hospital  near the restaurant .
So there must have been a positive correlation between the data for the sale of cold drinks and the data for the inflow of patients to the hospital , meaning as one went up or down the other went up or down too.
However does it mean that the  cold drinks caused people to get hospitalized ? Or vice versa - did people
drink because someone got hospitalized ?
None of the above was entirely correct in this situation. The reality was , the influx of people to the hospital and the sales of cold drinks were  caused by the sweltering heat of Chennai.. So if we were to forecast the sale of cold drinks ,  the causal factor would be  "temperature"  and not the "number of people admitted to the nearby hospital" . 
And this precisely  is one of the key  things to watch out for in analyzing the results of  regression analysis. While regression will give you correlation between 2 variables  , it may require an expert to confirm  if there is causality between the two.

No comments: