Reshaping a data frame with more than one measure variable

我的梦境 提交于 2019-11-28 21:18:54

Here's how you could do this with reshape(), from base R:

df2 <- reshape(df, direction="long",
               idvar = 1:2, varying = list(c(3,5), c(4,6)),
               v.names = c("p1", "p2"), times = c("quiz1", "quiz2"))

## Checking the output    
rbind(head(df2, 3), tail(df2, 3))
#           student month  time   p1   p2
# 1.1.quiz1       1     1 quiz1 20.0 30.0
# 1.2.quiz1       1     2 quiz1 20.1 30.1
# 1.3.quiz1       1     3 quiz1 20.2 30.2
# 2.3.quiz2       2     3 quiz2 80.7 90.7
# 2.4.quiz2       2     4 quiz2 80.8 90.8
# 2.5.quiz2       2     5 quiz2 80.9 90.9

You can also use column names (instead of column numbers) for idvar and varying. It's more verbose, but seems like better practice to me:

## The same operation as above, using just column *names*
df2 <- reshape(df, direction="long", idvar=c("student", "month"),
               varying = list(c("quiz1p1", "quiz2p1"), 
                              c("quiz1p2", "quiz2p2")), 
               v.names = c("p1", "p2"), times = c("quiz1", "quiz2"))

I think this does what you want:

#Break variable into two columns, one for the quiz and one for the part of the quiz
dfL <- transform(dfL, quiz = substr(variable, 1,5), 
                 part = substr(variable, 6,7))

#Adjust your dcast call:
dcast(dfL, student + month + quiz ~ part)
#-----
   student month  quiz   p1   p2
1        1     1 quiz1 20.0 30.0
2        1     1 quiz2 80.0 90.0
3        1     2 quiz1 20.1 30.1
...
18       2     4 quiz2 80.8 90.8
19       2     5 quiz1 20.9 30.9
20       2     5 quiz2 80.9 90.9
A5C1D2H2I1M1N2O1R2T1

There was a very similar question asked about half a year ago, in which I wrote the following function:

melt.wide = function(data, id.vars, new.names) {
  require(reshape2)
  require(stringr)
  data.melt = melt(data, id.vars=id.vars)
  new.vars = data.frame(do.call(
    rbind, str_extract_all(data.melt$variable, "[0-9]+")))
  names(new.vars) = new.names
  cbind(data.melt, new.vars)
}

You can use the function to "melt" your data as follows:

dfL <-melt.wide(df, id.vars=1:2, new.names=c("Quiz", "Part"))
head(dfL)
#   student month variable value Quiz Part
# 1       1     1  quiz1p1  20.0    1    1
# 2       1     2  quiz1p1  20.1    1    1
# 3       1     3  quiz1p1  20.2    1    1
# 4       1     4  quiz1p1  20.3    1    1
# 5       1     5  quiz1p1  20.4    1    1
# 6       2     1  quiz1p1  20.5    1    1
tail(dfL)
#    student month variable value Quiz Part
# 35       1     5  quiz2p2  90.4    2    2
# 36       2     1  quiz2p2  90.5    2    2
# 37       2     2  quiz2p2  90.6    2    2
# 38       2     3  quiz2p2  90.7    2    2
# 39       2     4  quiz2p2  90.8    2    2
# 40       2     5  quiz2p2  90.9    2    2

Once the data are in this form, you can much more easily use dcast() to get whatever form you desire. For example

head(dcast(dfL, student + month + Quiz ~ Part))
#   student month Quiz    1    2
# 1       1     1    1 20.0 30.0
# 2       1     1    2 80.0 90.0
# 3       1     2    1 20.1 30.1
# 4       1     2    2 80.1 90.1
# 5       1     3    1 20.2 30.2
# 6       1     3    2 80.2 90.2
标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!