currently I\'m using the build in function dist to calculate my distance matrix in R.
dist(featureVector,method=\"manhattan\")
This is curr
I am a windows user looking for an efficient way to compute the distance matrix to use it in a hierarchical clustering (using the function hclust from the "stats" package for example). The function Dist doesn't work in parallel in Windows so I had to look for something different, and I found the "wordspace" package of Stefan Evert which contains the dist.matrix
function.
You can try this code:
X <- data.frame(replicate(1000,sample(0:1,5000,rep=TRUE)))
system.time(d <- dist(X, method = "manhattan"))
system.time(d2 <- as.dist( dist.matrix(as.matrix(X), method="manhattan") ))
As you can see computing the distance matrix for a dataframe with 1000 binary features and 5000 instances is much faster with dist.matrix
These are the results in my laptop (i7-6500U):
> system.time(d <- dist(X, method = "manhattan"))
user system elapsed
151.79 0.04 152.59
> system.time(d2 <- as.dist( dist.matrix(as.matrix(X), method="manhattan") ))
user system elapsed
19.19 0.22 19.56
This solved my problem. Here you can check the original thread where I found it: http://r.789695.n4.nabble.com/Efficient-distance-calculation-on-big-matrix-td4633598.html
It doesn´t solve it in parallel but is enough in many occasions.