Correlation isn’t casuality

  • Correlations: a special case of covariance, it’s and index that tells if two variables move together in some way. For more details read: covariance.
  • Casuality: means that one variable directly produce change in another. For example if changes then changes.

To claim casuality, a much stronger evidence is required: controlled experiments, randomized trials, strong domain knowledge, ruling out confounding variables, temporal ordering (cause come before effect).

Example if you increase the dosage of a drug then blood proessure () decreases. But this is demonstrated through experiments and medical trials.

In general two variables can be correlated because of:

  • coincidence
  • a hidden third variable (confounder)
  • reverse casuality ( causes )
  • Shared underlying trends

Without proper analysis, correlation alone can mislead you.

Visual example:

   Y
   ↑
10 |                   *
 9 |                 *
 8 |               *
 7 |            *
 6 |         *
 5 |      *
 4 |   *
 3 | *
 2 |
 1 +------------------------------→ X
       1  2  3  4  5  6  7  8  9
  • = ice cream sales
  • = drowing incidents

Visually it seems like more ice cream sales causes more drowings. But this is NOT casual. Both increases because of a third variable: temperature.

  • When it’s hot more people buy ice cream
  • When it’s hot more people go to swim more accidents

Real Machine Learning Cases

Case 1 - The Hospital Mortality Paradox

Task: predict whether a patient admitted to the ICU will survive. After the training the model discovered that patients who received a certain treatement had a higher survival rate.

So the model learned a positive correlation: treatment likely to survive.

But in real world, doctors give that treatment only to patients who are already stable and not extremely sick:

  • Stable patients get treatment more likely to survive.
  • Critical patients cannot receive treatment less likely to survive.

The real cause was: how sick the patient already was, not the treatment.

Imagine that the model is deployed in practice, it would not improve survival of patients and in some cases would be dangerous for them.

Case 2 - The Job-Hiring Algorithm (Amazon)

Task: predict if the candidate should be hired.

The model was trained on a dataset of current and previous employee’s curriculum. Historically, Amazon had more male applicants in technical roles.

A model could then learn that being male candidate likely to be hired.

It wasn’t explicit, but correlated features (“women’s colleges”, certain keywords) were penalized.

In reality, gender doesn’t cause higher job performance.

Casual chain: Biased historical hiring biased dataset model learns correlation (i.e. learn the bias).