Simulating Co-occurrence data in R

家住魔仙堡 提交于 2019-12-11 04:46:07

问题


I am trying to create a data set of co-occurrence data where the variable of interest is a software application and I want to simulate an n by n matrix where each cell has a number that says the number of times application A was used with application B. How can I create a data set in R that I can use to test a set of clustering and partitioning algorithms. What model would I use and how would I generate the data in R?


回答1:


n    <- 10
apps <- LETTERS[1:n]
data <- matrix(0,n,n)
rownames(data) <- apps
colnames(data) <- apps

# create artificial clusters
data[1:3,1:5] <- matrix(sample(3:5,15,replace=T),3,5)
data[6:9,4:8] <- matrix(sample(1:3,20,replace=T),4,5)

# clustering
hc <- hclust(dist(data))
plot(hc)
rect.hclust(hc, k=2)

Note: This answer has been edited to reflect the fact the the co-occurrence matrix must be symmetric.




回答2:


set.seed(42)
# software names:
software <- c("a","b","c","d")
# times each software used:
times.each.sw <- c(5,10,12,3)

# co-occurrence data.frame
swdf <- setNames(data.frame(t(combn(software,2))),c("sw1","sw2"))
swdf$freq.cooc <- apply(combn(times.each.sw,2),2,function(x) sample(1:min(x),1) )
#  sw1 sw2 freq.cooc
#1   a   b         5
#2   a   c         5
#3   a   d         1
#4   b   c         9
#5   b   d         2
#6   c   d         2

If you prefer a matrix of co-occurrence, then something like this maybe:

mat <- diag(times.each.sw) 
dimnames(mat) <- list(software,software)
mat[lower.tri(mat)] <- swdf$freq.cooc
mat[upper.tri(mat)] <- t(mat)[upper.tri(mat)]

#  a  b  c d
#a 5  5  5 1
#b 5 10  9 2
#c 5  9 12 2
#d 1  2  2 3

The diagonal contains the number of times each software was used (i.e. used with itself). The lower/upper triangles will contain the number of times each combination was used, which will always have to be equal or less to the number of times the less frequently used of the pair was used.



来源:https://stackoverflow.com/questions/21616703/simulating-co-occurrence-data-in-r

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!