发表新帖

发表新帖

Parallel distance Matrix in R

前端未结

关注

 6  706

忘掉有多难 2020-12-08 17:14

currently I\'m using the build in function dist to calculate my distance matrix in R.

dist(featureVector,method=\"manhattan\")

This is curr

6条回答

执笔经年 (楼主)

2020-12-08 17:42
I am a windows user looking for an efficient way to compute the distance matrix to use it in a hierarchical clustering (using the function hclust from the "stats" package for example). The function Dist doesn't work in parallel in Windows so I had to look for something different, and I found the "wordspace" package of Stefan Evert which contains the dist.matrix function. You can try this code:
```
X <- data.frame(replicate(1000,sample(0:1,5000,rep=TRUE)))
system.time(d <- dist(X, method = "manhattan"))
system.time(d2 <- as.dist( dist.matrix(as.matrix(X), method="manhattan") ))
```
As you can see computing the distance matrix for a dataframe with 1000 binary features and 5000 instances is much faster with dist.matrix

These are the results in my laptop (i7-6500U):
```
> system.time(d <- dist(X, method = "manhattan"))
   user  system elapsed 
 151.79    0.04  152.59 
> system.time(d2 <- as.dist( dist.matrix(as.matrix(X), method="manhattan") ))
   user  system elapsed 
  19.19    0.22   19.56 
```
This solved my problem. Here you can check the original thread where I found it: http://r.789695.n4.nabble.com/Efficient-distance-calculation-on-big-matrix-td4633598.html

It doesn´t solve it in parallel but is enough in many occasions.
0 讨论(0)

查看其它6个回答
发布评论:

提交评论
- 加载中...

热议问题