Thursday, June 2, 2016

Mix-up about the covariance in Excel

One of the Excel mix-ups in its statistical functions that lasted longer is about the computation of the covariance.

Until recently (we think that this change was only introduced in the 2007 version) Excel provided two functions for the variance and standard deviation, one,for population and the other for sample data.

However, there was only one function for the covariance that returned the value for population data. There was (an there is) also only one function for the correlation, which is right, since the correlation values are the same for population ans sample data. This situation could lead to the erroneous conclusion that the covariance also has the same value for both cases.

The fixing of this situation was made by introducing two formulas for the computation of the covariance.

However, the procedure Covariance  of the Analysis ToolPak still returns the population covariances. Additionally, it also maintains another heritage mix-up. Although the Toolpak general methodology presents fixed values in the output, that does not change automatically if data values change, in this case the diagonals of the covariance matrix contain the formula to compute the population variance. This means that changes in the data will automatically change values in the diagonal, contrary to what happens for the other elements of the matrix.

The matrix.xla add-in, presented in another message also has only one function for the covariance matrix that returns the values for population data.

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.