Pearson Correlation Coefficient
- Quantifies the strength and direction of a linear relationship between two continuous variables.
- Values range from -1 (perfect negative linear) to 1 (perfect positive linear); 0 indicates no linear correlation.
- Only detects linear relationships and can be strongly affected by outliers.
Definition
Section titled “Definition”The Pearson correlation coefficient is a measure of the strength and direction of the linear relationship between two continuous variables. It is denoted by the symbol “r” and is calculated as:
In the formula, x and y are the two variables being analyzed, x̄ and ȳ are the means of those variables, and ∑ represents the sum over all observations. The coefficient ranges from -1 to 1, with 0 indicating no correlation.
Explanation
Section titled “Explanation”- A positive r indicates that as one variable increases, the other tends to increase (positive linear association). The source gives income and education level as an example of a positive correlation.
- A negative r indicates that as one variable increases, the other tends to decrease (negative linear association). The source gives age and physical strength as an example of a negative correlation.
- The Pearson correlation coefficient measures only linear relationships; it does not reliably capture nonlinear (curved) relationships.
- The coefficient is sensitive to outliers, which can disproportionately influence its value.
Examples
Section titled “Examples”Example 1: The relationship between height and weight
Section titled “Example 1: The relationship between height and weight”A sample of 100 individuals yields a mean height of 69 inches and a mean weight of 150 pounds. Calculating the Pearson correlation coefficient using the formula above gives r = 0.79. This indicates a strong positive correlation between height and weight. The example also notes the need to check the data for outliers that might have influenced this result.
Example 2: The relationship between hours of sleep and test scores
Section titled “Example 2: The relationship between hours of sleep and test scores”A sample of 50 students yields a mean number of hours of sleep of 7 and a mean test score of 75. The Pearson correlation coefficient is r = -0.34, indicating a moderate negative correlation between hours of sleep and test scores. The example advises considering other factors that could influence this relationship, such as students’ study habits or general level of motivation.
Notes or pitfalls
Section titled “Notes or pitfalls”- Only measures linear relationships; nonlinear associations (e.g., curved relationships) will not be accurately captured.
- Sensitive to outliers: observations that differ substantially from the rest can greatly influence the coefficient and potentially lead to misleading results.
Related terms
Section titled “Related terms”- Linear relationship
- Outliers
- Correlation (statistical)