Covariance, Variace and Mean

Let be a dataset. Let be a continuos infinite probability density function.

Mean

The mean (also called average or expected value) measures the central tendency of a set of numbers. It tells you where the “center” of the data lies.

For a continuos random variable , the mean is defined as:

Variance

Variance measures how spread out the values are around the mean

For a continuos random variable , the variance is defined as:

Covariance

Covariance measures how two variables change together or how they are correlated.

  • where is another dataset with his own mean and values

For two continuos random variable and , covariance is defined as:

or equivently in integral form as:

Covariance Properties

Positive Covariance: the two variables grows together Negative Covariance: when i.e is above the mean, is below the mean. Covariance means no linear relationship.

Variance is a special case of covariance:

Correlation

Normalized such that the output is in .

Interpretation:

  • : the two random variables are positively correlated (i.e they increase together)
  • : no linear correlation but they may still have nonlinear dependence
  • : the two random variables are negatively correlated (i.e when grows, decreases).

Covariance is scale dependent, correlation fixes that by dividing by the standard deviations.

[!warning ] Correlation isn’t casuality Covariance is often confused with correlation, but they’re two different concepts.