What Is The Aim Of Finding Correlation? Why Is It Used If Correlation Doesn’t Imply Causation?

Have you ever come across weird statistics about two events that are seemingly unrelated? For example, if one were asked to predict the sales of Air Conditioner (AC) units based solely on the knowledge of sales of frozen yogurts, the prediction may very well seem ridiculous. ACs and yogurts are, after all, two very different consumer goods produced by unrelated industries. One may argue that yogurt has as much in common with an AC as planet Earth has with Haley’s comet.   

A scatterplot of two random variables, X and Y. It can be seen that an increase in X is correlated to an increase in Y. But what remains to be established is that increase in X causes an increase in Y

A scatterplot of two random variables, X and Y. It can be seen that an increase in X is correlated to an increase in Y. However, what remains to be established is whether an increase in X causes an increase in Y (Photo Credit : Skbkekas/Wikimedia commons)

Or consider this, for example: In the first half of 2020, media was abuzz with this study, correlating temperature with COVID-19 transmission. Yet, this second study, carried out on data available from the same time interval, which concluded otherwise, did not receive similar attention. Why is that so? Does it mean that finding correlations is futile?

No.

First, let’s try to understand what correlation is before proceeding to find its merits. Then, we’ll move on to causation.

Finding Meaning From Random Data: Exploratory Analysis

An event is any occurrence that is observable (can be stored as a number). For example, the sales of AC units, marks obtained by students in a class, goals scored by a player, etc. These random events from real life are stored in the form of data; by using this data, sellers, teachers and coaches may be able to make some conclusions.

When many data points (numerical values) are available for a random event, the event is called a random variable (Random because the values they take cannot be predicted before the occurrence and variable because the values keep changing at each new occurrence).

When two random variables are taken into consideration, it might be possible that there exists some relationship between them that would help understand the events better and make accurate predictions about the future outcome of those particular events. This comes in quite handy when limited initial data is present. 

Two basic statistical concepts must be introduced that will help us understand correlation better.

The first is variance. Given a random variable X with n data points, variance describes the average difference of each data point from the mean of X. When plotted on a graph, variance indicates the dispersion of values. A more dispersed dataset would have a higher variance than a closely spaced dataset. 

Variance various

Variance indicates a dispersion of data points about the mean. Variance of the green variable is greater than the variance of the red variable. (Photo Credit : Master Uegly/Wikimedia commons)

The second is covariance. Given two random variables, X and Y, a change in the values of one variable may or may not be associated with a change in the values of the other variable. Covariance assigns a numerical value to this tendency of change in the values.

Correlation

Correlation is a mathematical tool used to indicate a relationship between two random events. The aim is to find out the degree of closeness of scattered points to a straight line (linear association). Given n data points about two events X and Y, correlation, r, is given by:

, What Is The Aim Of Finding Correlation? Why Is It Used If Correlation Doesn’t Imply Causation?, Science ABC, Science ABC  where,

cov(X, Y) = covariance between X and Y

, What Is The Aim Of Finding Correlation? Why Is It Used If Correlation Doesn’t Imply Causation?, Science ABC, Science ABC

From the mathematical definition of correlation, , What Is The Aim Of Finding Correlation? Why Is It Used If Correlation Doesn’t Imply Causation?, Science ABC, Science ABC always.

The following cases arise:

If r=1, then the data points lie on a straight line and there is no scattering. We say that X is linearly correlated to Y. This means that a change in X results in a proportional change in Y, which on a graph is seen as a straight line with a positive slope.

If r=-1, then the data points also lie on a straight line and there is no scattering. X is still linearly correlated to Y. However, a change in X results in an inverse proportional change in Y, which on a graph is shown as a straight line with a negative slope.

If -1 < r < 0, then the points remain scattered around a best-fit approximation line with a negative slope.

If 0 < r < 1, then the points remain scattered around a best-fit approximation line with a positive slope.

Pearson Correlation Coefficient and associated scatterplots

Correlation coefficient and associated plots. (Photo Credit : Laerd Statistics/Wikimedia commons)

Correlation May Not Imply Causation

Having studied the basics of correlation, let’s dive deeper into the interpretation of correlation. Often, quite erroneously, correlation between two random variables X and Y is interpreted as causation, i.e., X causes Y. Take the example of AC sales (X) and frozen yogurts sold (Y). If a positive correlation is found (say, r = 0.8), would that imply that X caused Y or vice-versa? No. What it implies is that there’s probably some other factor (Z), which is common to X and Y. What could Z be? 

What random variable could cause a positive change in AC sales and Yogurt sales? That random variable could probably be temperature. 

Think about this. Frozen yogurt is a dessert, which has a much higher probability of being consumed during the summer than during other seasons. An increase in temperatures could very well cause more people to buy such desserts just to cool themselves down. Similarly, ACs regulate room temperature, which is very useful in the summer season. Increasing temperatures may force even the most hardened individuals to buy an AC unit. Thus, increasing temperatures may lead to an increase in AC sales, or in simpler terms, Z causes both X and Y. 

Defining,The,Difference,Between,Correlation,And,Causation

Correlation is useful because it helps determine hidden factors affecting other events. (Photo Credit : desdemona72/Shutterstock)

Related Articles
Related Articles

Here, the relation between (Z, X) and (Z, Y) is a causal relationship. We can predict that an increase in X would be associated with an increase in Y. We can make this prediction due to our knowledge of the common variable Z (temperature). What correlation did, in this instance, was help us find the causal factor behind those two events.

Help us make this article better
About the Author

Argha has a Bachelors in Physics, Chemistry and Mathematics from University of Delhi, India. He enjoys discussing STEM topics and football. With a belief that studying science should be enjoyable and not scary, he wants to play his part in a changing world.

.
Science ABC YouTube Videos

  1. Multiverse Theory Explained: Does the Multiverse Really Exist? Truth of Multiple RealitiesMultiverse Theory Explained: Does the Multiverse Really Exist? Truth of Multiple Realities
  2. What Exactly is Spacetime? Explained in Ridiculously Simple WordsWhat Exactly is Spacetime? Explained in Ridiculously Simple Words
  3. What Are The Different Atomic Models? Dalton, Rutherford, Bohr and Heisenberg Models ExplainedWhat Are The Different Atomic Models? Dalton, Rutherford, Bohr and Heisenberg Models Explained
  4. Why Is Blood Drawn From Veins And Not From Arteries?Why Is Blood Drawn From Veins And Not From Arteries?
  5. Emotions and the Brain: What is the limbic system?Emotions and the Brain: What is the limbic system?
  6. Dark Matter Explained: What Exactly is Dark Matter? | A Beginner’s Guide to Dark MatterDark Matter Explained: What Exactly is Dark Matter? | A Beginner’s Guide to Dark Matter
  7. What Exactly is a Tesseract? (Hint: Not a Superhero Stone)What Exactly is a Tesseract? (Hint: Not a Superhero Stone)
  8. Respiratory System: From Inspiration to Expiration Explained in Simple WordsRespiratory System: From Inspiration to Expiration Explained in Simple Words