Friday, October 4, 2013

Causation and correlation


I was a little confused when one of my friends warned me that I am confusing correlation and causation in response to my post on Facebook. It was in my mind for quite a few months now. In fact I was trying to find an answer whether I was actually confusing. When I thought of correlation, the first thought that came to my mind was that of good old school days. The weather was not so deceiving those days. Whether the school reopens on first, second or third of June, the beginning of the rainy season exactly coincides with the reopening of schools and we will be all soaked in rain on that evening. But we used to enjoy the first rain of the season and the first day of the school year on the same day. So we can easily draw out a correlation between the two. The same is repeated every year. But does it actually imply that one of them caused the other? I should say, we cannot conclude that one of them caused the other. Let us consider another example. Every time I switched on the switch my lamp glows. As in the previous case, we may derive a correlation between the two events. Here I can definitely conclude that switching on the switch causes the lamp to glow. So how can I ascertain whether the correlation derived can be verified to be the cause?

Generally, if one factor (A) is observed to only be correlated with another factor (B), it is sometimes taken for granted that A is causing B, even when no evidence supports it. This is a logical fallacy because there are at least five possibilities: First possibility is that A may be the cause of B. Second one is B may be the cause of A. Third possibility is some unknown third factor C may actually be the cause of both A and B. The next possibility is there may be a combination of the above three relationships. For example, B may be the cause of A at the same time as A is the cause of B (contradicting that the only relationship between A and B is that A causes B). This describes a self-reinforcing system. The fifth one is that the "relationship" is a coincidence or so complex or indirect that it is more effectively called a coincidence (i.e. two events occurring at the same time that have no direct relationship to each other besides the fact that they are occurring at the same time). A larger sample size helps to reduce the chance of a coincidence, unless there is a systematic error in the experiment.

Since the 1950s, both the atmospheric CO2 level and obesity levels have increased sharply. If we conclude atmospheric CO2 causes obesity or vice versa, it is a false conclusion. Richer populations tend to eat more food and consume more energy. The two are having no impact on one another. If a person having an ailment is taking medicine and praying and got cured, if you conclude that the cure is due to praying is a false conclusion since not all people pray are not cured. The conclusion that the cure is because of taking medicine is a more logical one. However, since not all people taking medicine are cured, there might be other factors which are important.

Consider a weight suspended by a string. If you cut the string, the weight will fall down. If you repeat the experiment again and again, the same will happen. So we can conclude that the action of cutting the string cause the weight to fall down. The cause of the event is established beyond doubt. Now let us examine the event again. If the string is not cut, the weight will not be falling down, which means cutting of the string is the cause. But even if the string is cut, if gravitational force is absent, if would have remained in its position which leads us to the conclusion that the gravitational force is the cause. Again if you observe the act of cutting the string, the string is cut because we applied a force, which means that the application of force is the cause. Even if you apply the force, if the knife was not sharp, the string would have remained intact and that renders the sharpness of the knife as the cause. Again we were able to cut the string with a sharp knife because the knife is harder than the string. In short the perspective of the person observing the event also is crucial. It will be illogical to conclude any one of the above as the cause of the event. A more logical conclusion is to acknowledge all of the above as the cause of the event.

Another example is the observation that as ice cream sales increase, the rate of drowning deaths increases sharply. Therefore, ice cream consumption causes drowning. The example fails to recognize the importance of time and temperature in relationship to ice cream sales. Ice cream is sold during the hot summer months at a much greater rate than during colder times, and it is during these hot summer months that people are more likely to engage in activities involving water, such as swimming. The increased drowning deaths are simply caused by more exposure to water-based activities, not ice cream. The stated conclusion is false.

Much of scientific evidence is based upon a correlation of variable – they are observed to occur together. Scientists are careful to point out that correlation does not necessarily mean causation. The assumption that A causes B simply because A correlates with B is often not accepted as a legitimate form of argument. However, sometimes people commit the opposite fallacy – dismissing correlation entirely, as if it does not suggest causation. This would dismiss a large swath of important scientific evidence.

In conclusion, correlation is a valuable type of scientific evidence. But first correlations must be confirmed as real, and then every possible causative relationship must be systematically explored. In the end correlation can be used as powerful evidence for a cause and effect relationship between a treatment and benefit, a risk factor and a disease, or a social or economic factor and various outcomes. But it is also one of the most abused types of evidence, because it is easy and even tempting to come to premature conclusions based upon the preliminary appearance of a correlation.