I was a little confused when one of my friends warned me that I am
confusing correlation and causation in response to my post on Facebook. It was
in my mind for quite a few months now. In fact I was trying to find an answer
whether I was actually confusing. When I thought of correlation, the first
thought that came to my mind was that of good old school days. The weather was
not so deceiving those days. Whether the school reopens on first, second or
third of June, the beginning of the rainy season exactly coincides with the
reopening of schools and we will be all soaked in rain on that evening. But we
used to enjoy the first rain of the season and the first day of the school year
on the same day. So we can easily draw out a correlation between the two. The
same is repeated every year. But does it actually imply that one of them caused
the other? I should say, we cannot conclude that one of them caused the other. Let
us consider another example. Every time I switched on the switch my lamp glows.
As in the previous case, we may derive a correlation between the two events.
Here I can definitely conclude that switching on the switch causes the lamp to
glow. So how can I ascertain whether the correlation derived can be verified to
be the cause?
Generally, if one factor (A) is observed to only be correlated with
another factor (B), it is sometimes taken for granted that A is causing B, even
when no evidence supports it. This is a logical fallacy because there are at
least five possibilities: First possibility is that A may be the cause of B.
Second one is B may be the cause of A. Third possibility is some unknown third
factor C may actually be the cause of both A and B. The next possibility is there
may be a combination of the above three relationships. For example, B may be
the cause of A at the same time as A is the cause of B (contradicting that the
only relationship between A and B is that A causes B). This describes a self-reinforcing
system. The fifth one is that the "relationship" is a coincidence or
so complex or indirect that it is more effectively called a coincidence (i.e.
two events occurring at the same time that have no direct relationship to each
other besides the fact that they are occurring at the same time). A larger sample
size helps to reduce the chance of a coincidence, unless there is a systematic
error in the experiment.
Since the 1950s, both the atmospheric CO2 level and obesity levels have
increased sharply. If we conclude atmospheric CO2 causes obesity or vice versa,
it is a false conclusion. Richer populations tend to eat more food and consume
more energy. The two are having no impact on one another. If a person having an
ailment is taking medicine and praying and got cured, if you conclude that the
cure is due to praying is a false conclusion since not all people pray are not
cured. The conclusion that the cure is because of taking medicine is a more
logical one. However, since not all people taking medicine are cured, there
might be other factors which are important.
Consider a weight suspended by a string. If you cut the string, the
weight will fall down. If you repeat the experiment again and again, the same
will happen. So we can conclude that the action of cutting the string cause the
weight to fall down. The cause of the event is established beyond doubt. Now
let us examine the event again. If the string is not cut, the weight will not
be falling down, which means cutting of the string is the cause. But even if
the string is cut, if gravitational force is absent, if would have remained in
its position which leads us to the conclusion that the gravitational force is
the cause. Again if you observe the act of cutting the string, the string is
cut because we applied a force, which means that the application of force is
the cause. Even if you apply the force, if the knife was not sharp, the string
would have remained intact and that renders the sharpness of the knife as the
cause. Again we were able to cut the string with a sharp knife because the
knife is harder than the string. In short the perspective of the person
observing the event also is crucial. It will be illogical to conclude any one
of the above as the cause of the event. A more logical conclusion is to
acknowledge all of the above as the cause of the event.
Another example is the observation that as ice cream sales increase, the
rate of drowning deaths increases sharply. Therefore, ice cream consumption
causes drowning. The example fails to recognize the importance of time and
temperature in relationship to ice cream sales. Ice cream is sold during the
hot summer months at a much greater rate than during colder times, and it is
during these hot summer months that people are more likely to engage in
activities involving water, such as swimming. The increased drowning deaths are
simply caused by more exposure to water-based activities, not ice cream. The
stated conclusion is false.
Much of scientific evidence is based upon a correlation of variable –
they are observed to occur together. Scientists are careful to point out that
correlation does not necessarily mean causation. The assumption that A causes B
simply because A correlates with B is often not accepted as a legitimate form
of argument. However, sometimes people commit the opposite fallacy – dismissing
correlation entirely, as if it does not suggest causation. This would dismiss a
large swath of important scientific evidence.
In conclusion, correlation is a valuable type of scientific evidence.
But first correlations must be confirmed as real, and then every possible
causative relationship must be systematically explored. In the end correlation
can be used as powerful evidence for a cause and effect relationship between a
treatment and benefit, a risk factor and a disease, or a social or economic
factor and various outcomes. But it is also one of the most abused types of
evidence, because it is easy and even tempting to come to premature conclusions
based upon the preliminary appearance of a correlation.