Statistics > Covariance
Covariance
The extent to which two random variables vary together (co-vary) can be measured by their covariance. Consider the two random variables x and y:
x1, |
y1 |
x2, |
y2 |
x3, |
y3 |
. |
. |
xn, |
yn |
For two random variables x and y having means E{x} and E{y}, the covariance is defined as:
Cov(x,y) = E{[ x - E(x) ][ y - E(y) ]}
The covariance calculation begins with pairs of x and y, takes their differences from their mean values and multiplies these differences together. For instance, if for x1 and y1 this product is positive, for that pair of data points the values of x and y have varied together in the same direction from their means. If the product is negative, they have varied in opposite directions. The larger the magnitude of the product, the stronger the strength of the relationship. The covariance is defined as the mean value of this product, calculated using each pair of data points xi and yi. If the covariance is zero, then the cases in which the product was positive were offset by those in which it was negative, and there is no linear relationship between the two random variables.
Computationally, it is more efficient to use the following equivalent formula to calculate the covariance:
Cov(x,y) = E{xy} - E{x}E{y}
The value of the covariance is interpreted as follows:
Positive covariance - indicates that higher than average values of one variable tend to be paired with higher than average values of the other variable.
Negative covariance - indicates that higher than average values of one variable tend to be paired with lower than average values of the other variable.
Zero covariance - if the two random variables are independent, the covariance will be zero. However, a covariance of zero does not necessarily mean that the variables are independent. A nonlinear relationship can exist that still would result in a covariance value of zero.
Useful Properties
The variance of the sum of two random variables can be written as:
Var(x + y) = Var(x) + Var(y) + 2Cov(x,y)
When the random variables each are multiplied by constants a and b, the covariance can be written as follows:
Cov(ax,by) = abCov(x,y)
Limitations
Because the number representing covariance depends on the units of the data, it is difficult to compare covariances among data sets having different scales. A value that might represent a strong linear relationship for one data set might represent a very weak one in another.
The correlation coefficient addresses this issue by normalizing the covariance to the product of the standard deviations of the variables, creating a dimensionless quantity that facilitates the comparison of different data sets.
Statistics > Covariance