r-daisy

Cluster Analysis in R with missing data

拟墨画扇 提交于 2019-12-23 18:01:45
问题 So I spent a good amount of time trying to find the answer on how to do this. The only answer I have found so far is here: How to perform clustering without removing rows where NA is present in R Unfortunately, this is not working for me. So here is an example of my data (d in this example): Q9Y6X2 NA -6.350055943 -5.78314068 Q9Y6X3 NA NA -5.78314068 Q9Y6X6 0.831273549 4.875151493 0.78671493 Q9Y6Y8 4.831273549 0.457298979 5.59406985 Q9Y6Z4 4.831273549 4.875151493 NA Here is what I tried: >

Compute dissimilarity matrix for large data

落花浮王杯 提交于 2019-12-21 17:27:17
问题 I'm trying to compute a dissimilarity matrix based on a big data frame with both numerical and categorical features. When I run the daisy function from the cluster package I get the error message: Error: cannot allocate vector of size X. In my case X is about 800 GB. Any idea how I can deal with this problem? Additionally it would be also great if someone could help me to run the function in parallel cores. Below you can find the function that computes the dissimilarity matrix on the iris

Python equivalent of daisy() in the cluster package of R

微笑、不失礼 提交于 2019-12-20 09:38:17
问题 I have a dataset that contains both categorical (nominal and ordinal) and numerical attributes. I want to calculate the (dis)similarity matrix across my observations using these mixed attributes. Using the daisy() function of the cluster package in R, I can easily get a dissimilarity matrix as follows: if(!require("cluster")) { install.packages("cluster"); require("cluster") } data(flower) as.matrix(daisy(flower, metric = "gower")) This uses the gower metric to deal with the nominal variables

Bootstrapped tree values differ from PAST

你。 提交于 2019-12-11 07:55:40
问题 When I compute a bootstrapped tree in R I get different values to when I use PAST (http://folk.uio.no/ohammer/past/). How can I get the output to match from the two programs? Here's what I'm doing in R (data below): library("ape") library("phytools") library("phangorn") library("cluster") # compute neighbour-joined tree f <- function(xx) nj(daisy(xx)) nj_tree <- f(tab) nj_tree_root <- root(nj_tree, 1, r = TRUE) ## bootstrap # bootstrap values do not match PAST output - why is that? nj_tree

Python equivalent of daisy() in the cluster package of R

放肆的年华 提交于 2019-12-02 19:30:53
I have a dataset that contains both categorical (nominal and ordinal) and numerical attributes. I want to calculate the (dis)similarity matrix across my observations using these mixed attributes. Using the daisy() function of the cluster package in R, I can easily get a dissimilarity matrix as follows: if(!require("cluster")) { install.packages("cluster"); require("cluster") } data(flower) as.matrix(daisy(flower, metric = "gower")) This uses the gower metric to deal with the nominal variables. Is there a Python equivalent of the daisy() function in R? Or maybe any other module function that

Compute dissimilarity matrix for large data

*爱你&永不变心* 提交于 2019-12-01 08:42:25
I'm trying to compute a dissimilarity matrix based on a big data frame with both numerical and categorical features. When I run the daisy function from the cluster package I get the error message: Error: cannot allocate vector of size X. In my case X is about 800 GB. Any idea how I can deal with this problem? Additionally it would be also great if someone could help me to run the function in parallel cores. Below you can find the function that computes the dissimilarity matrix on the iris dataset: require(cluster) d <- daisy(iris) I've had a similar issue before. Running daisy() on even 5k

Compute dissimilarity matrix for large data

流过昼夜 提交于 2019-12-01 06:20:32
问题 I'm trying to compute a dissimilarity matrix based on a big data frame with both numerical and categorical features. When I run the daisy function from the cluster package I get the error message: Error: cannot allocate vector of size X. In my case X is about 800 GB. Any idea how I can deal with this problem? Additionally it would be also great if someone could help me to run the function in parallel cores. Below you can find the function that computes the dissimilarity matrix on the iris