I have this dataset:
values<-c(0.002,0.3,0.4,0.005,0.6,0.2,0.001,0.002,0.3,0.01)
codes<-c("A_1","A_2","A_3","B_1","B_2","B_3","B_4","C_1","C_2","C_3")
names(values)<-codes
In the codes, the letter indicates a group and the number a case within each group. Therefore I have three groups and 3 to 4 cases in each group (the actual dataset is much larger but this is a subset).
Then I calculate the distance matrix:
dist(values)->dist.m
Now I would like to convert the dist.m in a dataset with two columns: one containing the distances "inside" all groups (distance between A_1 and A_2, between B_2 and B_4, etc...), and another one containing the distances "between" groups (between A_1 and B_1, between C_1 and B_4, etc...)
Is there any easy way to do this in R?
Any help would be very appreciated.
thank you very much in advance.
Tina.
They may call them matrices but they are really not. There is however an as.matrix
function that will let you get matrix indexing:
> as.matrix(dist.m)[grep("A", codes), grep("A", codes) ]
A_1 A_2 A_3
A_1 0.000 0.298 0.398
A_2 0.298 0.000 0.100
A_3 0.398 0.100 0.000
So you can get the first part with pretty compact code:
> sapply(LETTERS[1:3], function(let) as.matrix(dist.m)[grep(let, codes), grep(let, codes) ]
+ )
$A
A_1 A_2 A_3
A_1 0.000 0.298 0.398
A_2 0.298 0.000 0.100
A_3 0.398 0.100 0.000
$B
B_1 B_2 B_3 B_4
B_1 0.000 0.595 0.195 0.004
B_2 0.595 0.000 0.400 0.599
B_3 0.195 0.400 0.000 0.199
B_4 0.004 0.599 0.199 0.000
$C
C_1 C_2 C_3
C_1 0.000 0.298 0.008
C_2 0.298 0.000 0.290
C_3 0.008 0.290 0.000
Then use negative logical addressing to get the rest:
> sapply(LETTERS[1:3], function(let) as.matrix(dist.m)[grepl(let, codes), !grepl(let, codes) ]
+ )
$A
B_1 B_2 B_3 B_4 C_1 C_2 C_3
A_1 0.003 0.598 0.198 0.001 0.000 0.298 0.008
A_2 0.295 0.300 0.100 0.299 0.298 0.000 0.290
A_3 0.395 0.200 0.200 0.399 0.398 0.100 0.390
$B
A_1 A_2 A_3 C_1 C_2 C_3
B_1 0.003 0.295 0.395 0.003 0.295 0.005
B_2 0.598 0.300 0.200 0.598 0.300 0.590
B_3 0.198 0.100 0.200 0.198 0.100 0.190
B_4 0.001 0.299 0.399 0.001 0.299 0.009
$C
A_1 A_2 A_3 B_1 B_2 B_3 B_4
C_1 0.000 0.298 0.398 0.003 0.598 0.198 0.001
C_2 0.298 0.000 0.100 0.295 0.300 0.100 0.299
C_3 0.008 0.290 0.390 0.005 0.590 0.190 0.009
I don't see a way of representing this as a two column data structure but you can use melt
in pkg::reshape2 to get a three column structure:
> melt( as.matrix(dist.m)[grep("A", codes), grep("A", codes) ] )
Var1 Var2 value
1 A_1 A_1 0.000
2 A_2 A_1 0.298
3 A_3 A_1 0.398
4 A_1 A_2 0.298
5 A_2 A_2 0.000
6 A_3 A_2 0.100
7 A_1 A_3 0.398
8 A_2 A_3 0.100
9 A_3 A_3 0.000
That would give you a rather long dataframe for display but it would be easy enough to put melt
inside the function call.
来源:https://stackoverflow.com/questions/17367277/how-to-extract-intragroup-and-intergroup-distances-from-a-distance-matrix-in-r