wide to long multiple measures each time

匿名 (未验证) 提交于 2019-12-03 01:47:02

问题:

I know the wide to long has been asked way too many times on here but I can't figure out how to turn the following into long format. Shoot I even asked one of the wide to long with 2 repeated measures on SO. I'm becoming frustrated with my inability to convert my data. How can I turn this (variable order doesn't matter):

      id trt    work.T1   play.T1   talk.T1   total.T1    work.T2    play.T2   talk.T2  total.T2 1   x1.1 cnt 0.34434350 0.7841665 0.1079332 0.88803151 0.64836951 0.87954320 0.7233519 0.5630988 2   x1.2  tr 0.06132255 0.8426960 0.3338658 0.04685878 0.23478670 0.19711687 0.5164015 0.7617968 3   x1.3  tr 0.36897981 0.1834721 0.3241316 0.76904051 0.07629721 0.06945971 0.4118995 0.7452974 4   x1.4  tr 0.40759356 0.5285396 0.5654258 0.23022542 0.92309504 0.15733957 0.4132653 0.7078273 5   x1.5 cnt 0.91433676 0.7029476 0.2031782 0.31518412 0.14721669 0.33345678 0.7620444 0.9868082 6   x1.6  tr 0.88870525 0.9132728 0.2197045 0.28266959 0.82239037 0.18006177 0.2591765 0.4516309 7   x1.7 cnt 0.98373218 0.2591739 0.6331153 0.71319565 0.41351839 0.14648269 0.7631898 0.1182174 8   x1.8  tr 0.47719528 0.7926248 0.3525205 0.86213792 0.61252061 0.29057544 0.9824048 0.2386353 9   x1.9  tr 0.69350823 0.6144696 0.8568732 0.10632352 0.06812050 0.93606889 0.6701190 0.4705228 10 x1.10 cnt 0.42574646 0.7006205 0.9507216 0.55032776 0.90413220 0.10246047 0.5899279 0.3523231 

into this:

      id trt time       work       play      talk      total 1   x1.1 cnt    1 0.34434350 0.78416653 0.1079332 0.88803151 2   x1.2  tr    1 0.06132255 0.84269599 0.3338658 0.04685878 3   x1.3  tr    1 0.36897981 0.18347215 0.3241316 0.76904051 4   x1.4  tr    1 0.40759356 0.52853960 0.5654258 0.23022542 5   x1.5 cnt    1 0.91433676 0.70294755 0.2031782 0.31518412 6   x1.6  tr    1 0.88870525 0.91327276 0.2197045 0.28266959 7   x1.7 cnt    1 0.98373218 0.25917392 0.6331153 0.71319565 8   x1.8  tr    1 0.47719528 0.79262477 0.3525205 0.86213792 9   x1.9  tr    1 0.69350823 0.61446955 0.8568732 0.10632352 10 x1.10 cnt    1 0.42574646 0.70062053 0.9507216 0.55032776 11  x1.1 cnt    2 0.64836951 0.87954320 0.7233519 0.56309884 12  x1.2  tr    2 0.23478670 0.19711687 0.5164015 0.76179680 13  x1.3  tr    2 0.07629722 0.06945971 0.4118995 0.74529740 14  x1.4  tr    2 0.92309504 0.15733957 0.4132653 0.70782726 15  x1.5 cnt    2 0.14721669 0.33345678 0.7620444 0.98680824 16  x1.6  tr    2 0.82239038 0.18006177 0.2591765 0.45163091 17  x1.7 cnt    2 0.41351839 0.14648269 0.7631898 0.11821741 18  x1.8  tr    2 0.61252061 0.29057544 0.9824048 0.23863532 19  x1.9  tr    2 0.06812050 0.93606889 0.6701190 0.47052276 20 x1.10 cnt    2 0.90413220 0.10246047 0.5899279 0.35232307 

The Data Set

id 

Thank you in advance!

EDIT: Something screwy happened when I was using set.seed (certainly an error I did). The actually data above is not the data you'll get if you use set.seed(10). I'm leaving the error for historical accuracy and it really doesn't affect the solutions people gave.

回答1:

This is pretty close and changing the names of columns should be within your skillset:

reshape(DF,         varying=c(work= c(3, 7), play= c(4,8), talk= c(5,9), total= c(6,10) ),         direction="long") 

EDIT: Adding a version that is almost an exact solution:

reshape(DF, varying=list(work= c(3, 7), play= c(4,8), talk= c(5,9), total= c(6,10) ),          v.names=c("Work", "Play", "Talk", "Total"),            # that was needed after changed 'varying' arg to a list to allow 'times'          direction="long",           times=1:2,        # substitutes number for T1 and T2         timevar="times")  # to name the time col 


回答2:

The most concise way is to use tidyr combined with dplyr library.

library(tidyr) library(dplyr) result %   # transfer to 'long' format   gather(loc, value, work.T1:total.`enter code here`T2) %>%   # separate the column into location and time   separate(loc, into = c('loc', 'time'), '\\.') %>%   # transfer to 'short' format   spread(loc, value) %>%   mutate(time = as.numeric(substr(time, 2, 2))) %>%   arrange(time) 

tidyr is designed specifically to make data tidy.



回答3:

Oddly enough I don't seem to get the same numbers as you (which I should since we both used set.seed(10)?) but otherwise this seems to do the trick:

library(reshape)  #this might work with reshape2 as well, I haven't tried ... DF2 

It's more complicated than @DWin's answer but maybe (?) more general.



回答4:

If you really didn't want the "T" in the "time" variable in the output, couldn't you simply do the following?

names(DF) = sub("T", "", names(DF)) reshape(DF, direction="long", varying=3:10) 

Or, without changing names(DF), you could simply set the sep= argument to include "T":

reshape(DF, direction="long", varying=3:10, sep=".T") 

I'm a bit confused, though. As Ben Bolker pointed out a in his comment, your "dataset code" doesn't provide the same numbers as what you have. Also, the output of DWin and mine matches perfectly, but it does not match with the "into this" output that you have in your original question.

I checked this by creating one data frame named "DWin" with his results, and one data frame named "mine" with my results and compared them using DWin == mine.

Can you verify that the output we've gotten is actually what you needed?



回答5:

Another way to approach the problem that requires very little code but would likely be slower,:

DF.1 


标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!