What reshaping problems can melt/cast not solve in a single step?

試著忘記壹切 提交于 2019-11-29 11:55:06

... almost a year later...

This came to mind the other day, and I have a sneaking suspicion that it is what you tried to show in your example, but unfortunately, your example code doesn't run!

melt sometimes takes things a bit too far for me when making my data "long". Sometimes, even though it is not what would necessarily be called "tidy data", I prefer to have a "semi-long" data.frame. This is easily achieved using base R's reshape, but requires a few extra steps with the "reshape2" package, as demonstrated below:

Prerequisite: sample data.

set.seed(1)
myDf <- data.frame(
  ID.1 = sample(letters[1:5], 5, replace = TRUE),
  ID.2 = 1:5,
  V.1 = sample(10:14, 5, replace = TRUE),
  V.2 = sample(5:9, 5, replace = TRUE),
  V.3 = sample(3:14, 5, replace = TRUE),
  W.1 = sample(LETTERS, 5, replace = TRUE),
  W.2 = sample(LETTERS, 5, replace = TRUE),
  W.3 = sample(LETTERS, 5, replace = TRUE)
)
myDf
#   ID.1 ID.2 V.1 V.2 V.3 W.1 W.2 W.3
# 1    b    1  14   6   8   Y   K   M
# 2    b    2  14   5  11   F   A   P
# 3    c    3  13   8  14   Q   J   M
# 4    e    4  13   6   7   D   W   E
# 5    b    5  10   8  12   G   I   V

The "semi-long" output that I'm looking for. Easily achieved with base R's reshape.

reshape(myDf, direction = "long", idvar=1:2, varying = 3:ncol(myDf))
#       ID.1 ID.2 time  V W
# b.1.1    b    1    1 14 Y
# b.2.1    b    2    1 14 F
# c.3.1    c    3    1 13 Q
# e.4.1    e    4    1 13 D
# b.5.1    b    5    1 10 G
# b.1.2    b    1    2  6 K
# b.2.2    b    2    2  5 A
# c.3.2    c    3    2  8 J
# e.4.2    e    4    2  6 W
# b.5.2    b    5    2  8 I
# b.1.3    b    1    3  8 M
# b.2.3    b    2    3 11 P
# c.3.3    c    3    3 14 M
# e.4.3    e    4    3  7 E
# b.5.3    b    5    3 12 V

melt is great if you wanted the equivalent of stack, especially since stack discards all factor variables, which is frustrating when read.table and family defaults to stringsAsFactors = TRUE. (You can make it work, but you need to convert the relevant columns to character before you can use stack). But, it is not what I'm looking for, in particular because of how it has handled the "variable" column.

library(reshape2)
myDfL <- melt(myDf, id.vars=1:2)
head(myDfL)
#   ID.1 ID.2 variable value
# 1    b    1      V.1    14
# 2    b    2      V.1    14
# 3    c    3      V.1    13
# 4    e    4      V.1    13
# 5    b    5      V.1    10
# 6    b    1      V.2     6

To fix this, one needs to first split the "variable" column, and then use dcast to get the same format of output as you would get from reshape.

myDfL <- cbind(myDfL, colsplit(myDfL$variable, "\\.", names=c("var", "time")))
dcast(myDfL, ID.1 + ID.2 + time ~ var, value.var="value")
#    ID.1 ID.2 time  V W
# 1     b    1    1 14 Y
# 2     b    1    2  6 K
# 3     b    1    3  8 M
# 4     b    2    1 14 F
# 5     b    2    2  5 A
# 6     b    2    3 11 P
# 7     b    5    1 10 G
# 8     b    5    2  8 I
# 9     b    5    3 12 V
# 10    c    3    1 13 Q
# 11    c    3    2  8 J
# 12    c    3    3 14 M
# 13    e    4    1 13 D
# 14    e    4    2  6 W
# 15    e    4    3  7 E
标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!