问题
This question already has an answer here:
- how to spread or cast multiple values in r [duplicate] 2 answers
I have a dataframe in R with 2186 obs of 38 vars. Rows have an ID variable referring to unique experiments and using
length(unique(df$ID))==nrow(df)
n_occur<-data.frame(table(df$ID))
I know 327 of my rows have repeated IDs with some IDs repeated more than once. I am trying to merge rows with the same ID as these aren't duplicates but just second, third etc. observations within a given experiment.
So for example if I had
x y ID
1 2 a
1 3 b
2 4 c
1 3 d
1 4 a
3 2 b
2 3 a
I would like to end up with
x y ID x2 y2 ID2 x3 y3 ID3
1 2 a 1 4 a 2 3 a
1 3 b 3 2 b na na na
2 4 c na na na na na na
1 3 d na na na na na na
I've seen similar questions for SQL and php but this hasn't helped me with my attempts in R. Any help would be gratefully appreciated.
回答1:
You could use the enhanced dcast function from the data.table package for that where you can select multiple value variables. With setDT(mydf) you convert your dataframe to a datatable and with [, idx := 1:.N, by = ID] you add a index by ID which you use subsequently in the dcast formula:
library(data.table)
dcast(setDT(mydf)[, idx := 1:.N, by = ID], ID ~ idx, value.var = c("x","y"))
Or with the development version of data.table (v1.9.7+), you can use the new rowid function:
dcast(setDT(mydf), ID ~ rowid(ID), value.var = c("x","y"))
gives:
ID x_1 x_2 x_3 y_1 y_2 y_3
1: a 1 1 2 2 4 3
2: b 1 3 NA 3 2 NA
3: c 2 NA NA 4 NA NA
4: d 1 NA NA 3 NA NA
Used data:
mydf <- structure(list(x = c(1L, 1L, 2L, 1L, 1L, 3L, 2L), y = c(2L, 3L,
4L, 3L, 4L, 2L, 3L), ID = structure(c(1L, 2L, 3L, 4L, 1L, 2L,
1L), .Label = c("a", "b", "c", "d"), class = "factor")), .Names = c("x",
"y", "ID"), class = "data.frame", row.names = c(NA, -7L))
来源:https://stackoverflow.com/questions/35182749/merging-rows-with-the-same-id-variable