Parallel distance Matrix in R

前端 未结 6 709
忘掉有多难
忘掉有多难 2020-12-08 17:14

currently I\'m using the build in function dist to calculate my distance matrix in R.

dist(featureVector,method=\"manhattan\")

This is curr

6条回答
  •  再見小時候
    2020-12-08 17:36

    I am also working with somewhat large distance matrices and trying to speed-up the computation. Will Benson above is likely to be correct when he says that "the time to start up the function and export the variables to the cluster would probably be longer than just using".

    However, I think this applies to distance matrices with small to moderate size. See the example bellow using the functions Dist from the package amap with 10 processors, dist from the package stats, and rdist from package fields, which calls a Fortran function. The first example creates a 400 x 400 distance matrix. The second creates a 3103 x 3103 distance matrix.

    require(sp)
    require(fields)
    require(amap)
    data(meuse.grid)
    meuse.gridA <- meuse.grid[1:400, 1:2]
    meuse.gridB <- meuse.grid[, 1:2]
    
    # small distance matrix
    a <- Sys.time()
    invisible(dist(meuse.gridA, diag = TRUE, upper = TRUE))
    Sys.time() - a
    Time difference of 0.002138376 secs
    a <- Sys.time()
    invisible(Dist(meuse.gridA, nbproc = 10, diag = TRUE, upper = TRUE))
    Sys.time() - a
    Time difference of 0.005409241 secs
    a <- Sys.time()
    invisible(rdist(meuse.gridA))
    Sys.time() - a
    Time difference of 0.02312016 secs
    
    # large distance matrix
    a <- Sys.time()
    invisible(dist(meuse.gridB, diag = TRUE, upper = TRUE))
    Sys.time() - a
    Time difference of 0.09845328 secs
    a <- Sys.time()
    invisible(Dist(meuse.gridB, nbproc = 10, diag = TRUE, upper = TRUE))
    Sys.time() - a
    Time difference of 0.05900002 secs
    a <- Sys.time()
    invisible(rdist(meuse.gridB))
    Sys.time() - a
    Time difference of 0.8928168 secs
    

    Note how the computation time reduced from 0.09845328 secs to 0.05900002 secs using Dist compared to dist when the distance matrix was large (3103 x 3103). As such, I would suggest that you use function Dist from the amap package provided you have several processors available.

提交回复
热议问题