问题
I need to reorganize data from a csv file that contains mostly repeating data. I have the data imported into R in a dataframe but I am having trouble with the following:
ID Language Author Keyword
12 eng Rob COLOR=Red
12 eng Rob SIZE=Large
12 eng Rob DD=1
15 eng John COLOR=Red
15 eng John SIZE=Medium
15 eng John DD=2
What I need to do is transform this into a row with each keyword in a separate column
ID Language Author COLOR SIZE DD
12 eng Rob Red Large 1
Any ideas?
回答1:
Using the reshape2
package this is straightforward:
With tt
defined as in Gary's answer
library("reshape2")
tt <- cbind(tt, colsplit(tt$Keyword, "=", c("Name", "Value")))
tt_new <- dcast(tt, ID + Language + Author ~ Name, value.var="Value")
which gives
> tt_new
ID Language Author COLOR DD SIZE
1 12 eng Rob Red 1 Large
2 15 eng John Red 2 Medium
回答2:
Using plyr
ans strsplit
you can do something like this :
library(plyr)
res <- ddply(dat,.(ID,Language,Author),function(x){
unlist(sapply(strsplit(x$Keyword,'='),'[',2))
})
colnames(res)[4:6] <- c('COLOR','SIZE','DD')
ID Language Author COLOR SIZE DD
1 12 eng Rob Red Large 1
2 15 eng John Red Medium 2
Edit: Here is a generalization that addresses @Brian's concern:
res <- ddply(dat,.(ID,Language,Author), function(x){
kv <- strsplit(x$Keyword, '=')
setNames(sapply(kv, `[`, 2),
sapply(kv, `[`, 1)) })
回答3:
Try this using reshape2
:
tt <- read.table(header=T,text='ID Language Author Keyword
12 eng Rob COLOR=Red
12 eng Rob SIZE=Large
12 eng Rob DD=1
15 eng John COLOR=Red
15 eng John SIZE=Medium
15 eng John DD=2')
tt$Keyword <- as.character(tt$Keyword)
tt <- transform(tt, key_val = lapply(tt$Keyword,function(x) strsplit(x,'=')[[1]][2]),
key_var = lapply(tt$Keyword,function(x) strsplit(x,'=')[[1]][1]))
tt_new <- dcast (tt, ID + Language + Author ~ key_var, value.var='key_val')
来源:https://stackoverflow.com/questions/15032270/reorganizing-data-from-3-rows-to-1