Covariance

TL;DR

Measures how two random variables vary together and whether they move in the same or opposite directions.
Computed from deviations from each variable’s expected value using a summed product formula.
Magnitude depends on units and scale; use the correlation coefficient to standardize for comparisons.

Definition

Covariance is a measure of the joint variability of two random variables. It indicates how the two variables are related and whether they change together.

The formula for covariance is:

Cov(X,Y) = \frac{\sum (x-E(x)) * (y-E(y))}{n}

where X and Y are the two random variables, E(x) and E(y) are the expected values of X and Y, and n is the number of observations.

Explanation

Sign:
- Positive covariance: the two variables tend to change in the same direction.
- Negative covariance: the two variables tend to change in opposite directions.
- Zero covariance: there is no linear relationship between the two variables.
Computation steps:
- Compute the expected value (average) for each variable.
- Subtract the expected value from each observation to obtain deviations.
- Multiply corresponding deviations for the two variables and sum these products.
- Divide the sum by n to obtain the covariance.
Interpretation:
- A larger absolute covariance indicates a stronger relationship in raw units, and a smaller absolute covariance indicates a weaker relationship.
- Covariance is not standardized; its magnitude depends on the variables’ units and scale.

Examples

Heights and weights

Suppose a dataset contains the heights and weights of a group of people. To calculate the covariance of height and weight:

Calculate the expected value (average) for height and for weight by summing each variable’s observations and dividing by the number of observations.
Subtract the expected value from each observation to get deviations for both height and weight.
Multiply the height and weight deviations for each observation and sum these products.
Divide the sum by n to obtain the covariance of height and weight.

The resulting covariance indicates whether taller individuals tend to be heavier (positive covariance) or lighter (negative covariance), and gives a measure of the relationship’s strength in the original units.

Use cases

Covariance is used to analyze relationships between variables in fields such as statistics and machine learning.

Notes or pitfalls

Covariance is not normalized and therefore cannot be directly compared across different variable pairs.
To standardize covariance and compare relationship strength across variables, divide the covariance by the product of the two variables’ standard deviations to obtain the correlation coefficient.
The correlation coefficient ranges from -1 to 1:
- -1 indicates a perfect negative linear relationship.
- 1 indicates a perfect positive linear relationship.
- 0 indicates no linear relationship.
Covariance indicates direction and gives a raw measure of strength, but it does not provide a standardized measure of relationship strength.

Correlation coefficient
Standard deviation
Expected value