Skip to content

Covariance

  • Measures how two random variables vary together and whether they move in the same or opposite directions.
  • Computed from deviations from each variable’s expected value using a summed product formula.
  • Magnitude depends on units and scale; use the correlation coefficient to standardize for comparisons.

Covariance is a measure of the joint variability of two random variables. It indicates how the two variables are related and whether they change together.

The formula for covariance is:

Cov(X,Y)=(xE(x))(yE(y))nCov(X,Y) = \frac{\sum (x-E(x)) * (y-E(y))}{n}

where X and Y are the two random variables, E(x) and E(y) are the expected values of X and Y, and n is the number of observations.

  • Sign:
    • Positive covariance: the two variables tend to change in the same direction.
    • Negative covariance: the two variables tend to change in opposite directions.
    • Zero covariance: there is no linear relationship between the two variables.
  • Computation steps:
    • Compute the expected value (average) for each variable.
    • Subtract the expected value from each observation to obtain deviations.
    • Multiply corresponding deviations for the two variables and sum these products.
    • Divide the sum by n to obtain the covariance.
  • Interpretation:
    • A larger absolute covariance indicates a stronger relationship in raw units, and a smaller absolute covariance indicates a weaker relationship.
    • Covariance is not standardized; its magnitude depends on the variables’ units and scale.

Suppose a dataset contains the heights and weights of a group of people. To calculate the covariance of height and weight:

  • Calculate the expected value (average) for height and for weight by summing each variable’s observations and dividing by the number of observations.
  • Subtract the expected value from each observation to get deviations for both height and weight.
  • Multiply the height and weight deviations for each observation and sum these products.
  • Divide the sum by n to obtain the covariance of height and weight.

The resulting covariance indicates whether taller individuals tend to be heavier (positive covariance) or lighter (negative covariance), and gives a measure of the relationship’s strength in the original units.

Covariance is used to analyze relationships between variables in fields such as statistics and machine learning.

  • Covariance is not normalized and therefore cannot be directly compared across different variable pairs.
  • To standardize covariance and compare relationship strength across variables, divide the covariance by the product of the two variables’ standard deviations to obtain the correlation coefficient.
  • The correlation coefficient ranges from -1 to 1:
    • -1 indicates a perfect negative linear relationship.
    • 1 indicates a perfect positive linear relationship.
    • 0 indicates no linear relationship.
  • Covariance indicates direction and gives a raw measure of strength, but it does not provide a standardized measure of relationship strength.
  • Correlation coefficient
  • Standard deviation
  • Expected value