Mahalanobis distance in R

前端 未结 6 2123
灰色年华
灰色年华 2020-12-13 22:37

I have found the mahalanobis.dist function in package StatMatch (http://cran.r-project.org/web/packages/StatMatch/StatMatch.pdf) but it isn\'t doing exactly what I want. It

相关标签:
6条回答
  • 2020-12-13 23:06

    How about using the mahalanobis function in the stats package:

     mahalanobis(x, center, cov, inverted = FALSE, ...)
    
    0 讨论(0)
  • 2020-12-13 23:09

    There a very easy way to do it using R Package "biotools". In this case you will get a Squared Distance Mahalanobis Matrix.

    #Manly (2004, p.65-66)
    
    x1 <- c(131.37, 132.37, 134.47, 135.50, 136.17)
    x2 <- c(133.60, 132.70, 133.80, 132.30, 130.33)
    x3 <- c(99.17, 99.07, 96.03, 94.53, 93.50)
    x4 <- c(50.53, 50.23, 50.57, 51.97, 51.37)
    
    #size (n x p) #Means 
    x <- cbind(x1, x2, x3, x4) 
    
    #size (p x p) #Variances and Covariances
    Cov <- matrix(c(21.112,0.038,0.078,2.01, 0.038,23.486,5.2,2.844, 
            0.078,5.2,24.18,1.134, 2.01,2.844,1.134,10.154), 4, 4)
    
    library(biotools)
    Mahalanobis_Distance<-D2.dist(x, Cov)
    print(Mahalanobis_Distance)
    
    0 讨论(0)
  • 2020-12-13 23:13

    Mahalanobis distance is equivalent to (squared) Euclidean distance if the covariance matrix is identity. If you have covariance between your variables, you can make Mahalanobis and sq Euclidean equal by whitening the matrix first to remove the covariance. I.e., do:

    #X is your matrix
    if (!require("whitening")) install.packages("whitening")
    
    X <- whitening::whiten(X) # default is ZCA (Mahalanobis) whitening
    X_dist <- dist(X, diag = T, method = "euclidean")^2
    

    You can confirm that this gives you the same distance matrix as the code that Davit provided in one of the previous answers.

    0 讨论(0)
  • 2020-12-13 23:17

    You can wrap the function stats::mahalanobis as bellow to output a mahalanobis distance matrix (pairwise mahalanobis distances):

    # x - data frame
    # cx - covariance matrix; if not provided, 
    #      it will be estimated from the data
    mah <- function(x, cx = NULL) {
      if(is.null(cx)) cx <- cov(x)
      out <- lapply(1:nrow(x), function(i) {
        mahalanobis(x = x, 
                    center = do.call("c", x[i, ]),
                    cov = cx)
      })
      return(as.dist(do.call("rbind", out)))
    }
    

    Then, you can cluster your data and plot it, for example:

    # Dummy data
    x <- data.frame(X = c(rnorm(10, 0), rnorm(10, 5)), 
                    Y = c(rnorm(10, 0), rnorm(10, 7)), 
                    Z = c(rnorm(10, 0), rnorm(10, 12)))
    rownames(x) <- LETTERS[1:20]
    plot(x, pch = LETTERS[1:20])
    

    # Comute the mahalanobis distance matrix
    d <- mah(x)
    d
    
    # Cluster and plot
    hc <- hclust(d)
    plot(hc)
    

    0 讨论(0)
  • 2020-12-13 23:17

    Your output before taking the square root is :

    inversepooledcov %*% t(meandiffmatrix) %*% meandiffmatrix
              [,1]        [,2]
    x -0.004349227 -0.01304768
    y  0.114529639  0.34358892
    

    I think you can'take the square root of negative numbers number, so you have NAN for negative elements:

     sqrt(inversepooledcov %*% t(meandiffmatrix) %*% meandiffmatrix)
           [,1]      [,2]
    x       NaN       NaN
    y 0.3384223 0.5861646
    
    Warning message:
    In sqrt(inversepooledcov %*% t(meandiffmatrix) %*% meandiffmatrix) :
      NaNs produced
    
    0 讨论(0)
  • 2020-12-13 23:22

    I've been trying this out from the same website that you looked at and then stumbled upon this question. I managed to get the script to work, But I get a different result.

    #WORKING EXAMPLE
    #MAHALANOBIS DIST OF TWO MATRICES
    
    #define matrix
    mat1<-matrix(data=c(2,2,6,7,4,6,5,4,2,1,2,5,5,3,7,4,3,6,5,3),nrow=10)
    mat2<-matrix(data=c(6,7,8,5,5,5,4,7,6,4),nrow=5)
    #center data
    mat1.1<-scale(mat1,center=T,scale=F)
    mat2.1<-scale(mat2,center=T,scale=F)
    #cov matrix
    mat1.2<-cov(mat1.1,method="pearson")
    mat2.2<-cov(mat2.1,method="pearson")
    n1<-nrow(mat1)
    n2<-nrow(mat2)
    n3<-n1+n2
    #pooled matrix
    mat3<-((n1/n3)*mat1.2) + ((n2/n3)*mat2.2)
    #inverse pooled matrix
    mat4<-solve(mat3)
    #mean diff
    mat5<-as.matrix((colMeans(mat1)-colMeans(mat2)))
    #multiply
    mat6<-t(mat5) %*% mat4
    #multiply
    sqrt(mat6 %*% mat5)
    

    I think the function mahalanobis() is used to compute mahalanobis distances between individuals (rows) in one matrix. The function pairwise.mahalanobis() from package(HDMD) can compare two or more matrices and give mahalanobis distances between the matrices.

    0 讨论(0)
提交回复
热议问题