Mahalonobis distance in R, error: system is computationally singular

↘锁芯ラ 提交于 2019-11-27 03:20:42

问题


I'd like to calculate multivariate distance from a set of points to the centroid of those points. Mahalanobis distance seems to be suited for this. However, I get an error (see below).

Can anyone tell me why I am getting this error, and if there is a way to work around it?

If you download the coordinate data and the associated environmental data, you can run the following code.

require(maptools)
occ <- readShapeSpatial('occurrences.shp')
load('envDat.Rdata')

#standardize the data to scale the variables
dat <- as.matrix(scale(dat))
centroid <- dat[1547,]  #let's assume this is the centroid in this case

#Calculate multivariate distance from all points to centroid
mahalanobis(dat,center=centroid,cov=cov(dat))

Error in solve.default(cov, ...) : 
  system is computationally singular: reciprocal condition number = 9.50116e-19

回答1:


The Mahalanobis distance requires you to calculate the inverse of the covariance matrix. The function mahalanobis internally uses solve which is a numerical way to calculate the inverse. Unfortunately, if some of the numbers used in the inverse calculation are very small, it assumes that they are zero, leading to the assumption that it is a singular matrix. This is why it specifies that they are computationally singular, because the matrix might not be singular given a different tolerance.

The solution is to set the tolerance for when it assumes that they are zero. Fortunately, mahalanobis allows you to pass this parameter (tol) to solve:

mahalanobis(dat,center=centroid,cov=cov(dat),tol=1e-20)
# [1] 24.215494 28.394913  6.984101 28.004975 11.095357 14.401967 ...



回答2:


mahalanobis uses the covariance matrix, cov, (more precisely the inverse of it) to transform the coordinate system, then compute Euclidian distance in the new coordinates. A standard reference is Duda & Hart "Pattern Classification and Scene Recognition"

Looks like your cov matrix is singular. Perhaps there are linearly-dependent columns in "dat" that are unnecessary? Setting the tolerance to zero won't help if the covariance matrix is truly singular. The first thing to do, instead, is look for columns that might be a rescaling of some other column, or might be just a sum of 2 or more other columns and remove them. Such columns are redundant for the mahalanobis distance.

BTW, since mahalanobis distance is effectively a rescaling and rotation, calling the scaling function looks superfluous - any reason why you want that?



来源:https://stackoverflow.com/questions/22134398/mahalonobis-distance-in-r-error-system-is-computationally-singular

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!