Reshape wide format, to multi-column long format

前端 未结 4 1631
一生所求
一生所求 2020-12-14 10:01

I want to reshape a wide format dataset that has multiple tests which are measured at 3 time points:

   ID   Test Year   Fall Spring Winter
    1   1   2008          


        
相关标签:
4条回答
  • 2020-12-14 10:09

    Base reshape function alternative method is below. Though this required using reshape twice, there might be a simpler way.

    Assuming your dataset is called df1

    tmp <- reshape(df1,idvar=c("ID","Year"),timevar="Test",direction="wide")
    result <- reshape(
       tmp,
       idvar=c("ID","Year"),
       varying=list(3:5,6:8),
       v.names=c("Test1","Test2"),
       times=c("Fall","Spring","Winter"),
       direction="long"
    )
    

    Which gives:

    > result
                  ID Year   time Test1 Test2
    1.2008.Fall    1 2008   Fall    15    22
    1.2009.Fall    1 2009   Fall    12    10
    2.2008.Fall    2 2008   Fall    12    13
    2.2009.Fall    2 2009   Fall    16    23
    3.2008.Fall    3 2008   Fall    11    17
    3.2009.Fall    3 2009   Fall    13    14
    1.2008.Spring  1 2008 Spring    16    22
    1.2009.Spring  1 2009 Spring    13    14
    2.2008.Spring  2 2008 Spring    13    11
    2.2009.Spring  2 2009 Spring    14    20
    3.2008.Spring  3 2008 Spring    12    12
    3.2009.Spring  3 2009 Spring    11     9
    1.2008.Winter  1 2008 Winter    19    24
    1.2009.Winter  1 2009 Winter    27    20
    2.2008.Winter  2 2008 Winter    25    29
    2.2009.Winter  2 2009 Winter    21    26
    3.2008.Winter  3 2008 Winter    22    23
    3.2009.Winter  3 2009 Winter    27    31
    
    0 讨论(0)
  • Sticking with base R, this is another good candidate for the "stack + reshape" routine. Assuming our dataset is called "mydf":

    mydf.temp <- data.frame(mydf[1:3], stack(mydf[4:6]))
    mydf2 <- reshape(mydf.temp, direction = "wide", 
                     idvar=c("ID", "Year", "ind"), 
                     timevar="Test")
    names(mydf2) <- c("ID", "Year", "Time", "Test1", "Test2")
    mydf2
    #    ID Year   Time Test1 Test2
    # 1   1 2008   Fall    15    22
    # 2   1 2009   Fall    12    10
    # 5   2 2008   Fall    12    13
    # 6   2 2009   Fall    16    23
    # 9   3 2008   Fall    11    17
    # 10  3 2009   Fall    13    14
    # 13  1 2008 Spring    16    22
    # 14  1 2009 Spring    13    14
    # 17  2 2008 Spring    13    11
    # 18  2 2009 Spring    14    20
    # 21  3 2008 Spring    12    12
    # 22  3 2009 Spring    11     9
    # 25  1 2008 Winter    19    24
    # 26  1 2009 Winter    27    20
    # 29  2 2008 Winter    25    29
    # 30  2 2009 Winter    21    26
    # 33  3 2008 Winter    22    23
    # 34  3 2009 Winter    27    31
    
    0 讨论(0)
  • 2020-12-14 10:24

    Using reshape2:

    # Thanks to Ista for helping with direct naming using "variable.name"
    df.m <- melt(df, id.var = c("ID", "Test", "Year"), variable.name = "Time")
    df.m <- transform(df.m, Test = paste0("Test", Test))
    dcast(df.m, ID + Year + Time ~ Test, value.var = "value")
    

    Update: Using data.table melt/cast from versions >= 1.9.0:

    data.table from versions 1.9.0 imports reshape2 package and implements fast melt and dcast methods in C for data.tables. A comparison of speed on bigger data is shown below.

    For more info regarding NEWS, go here.

    require(data.table) ## ver. >=1.9.0
    require(reshape2)
    
    dt <- as.data.table(df, key=c("ID", "Test", "Year"))
    dt.m <- melt(dt, id.var = c("ID", "Test", "Year"), variable.name = "Time")
    dt.m[, Test := paste0("Test", Test)]
    dcast.data.table(dt.m, ID + Year + Time ~ Test, value.var = "value")
    

    At the moment, you'll have to write dcast.data.table explicitly as it's not a S3 generic in reshape2 yet.


    Benchmarking on bigger data:

    # generate data:
    set.seed(45L)
    DT <- data.table(ID = sample(1e2, 1e7, TRUE), 
            Test = sample(1e3, 1e7, TRUE), 
            Year = sample(2008:2014, 1e7,TRUE), 
            Fall = sample(50, 1e7, TRUE), 
            Spring = sample(50, 1e7,TRUE), 
            Winter = sample(50, 1e7, TRUE))
    DF <- as.data.frame(DT)
    

    reshape2 timings:

    reshape2_melt <- function(df) {
        df.m <- melt(df, id.var = c("ID", "Test", "Year"), variable.name = "Time")
    }
    # min. of three consecutive runs
    system.time(df.m <- reshape2_melt(DF))
    #   user  system elapsed 
    # 43.319   4.909  48.932 
    
    df.m <- transform(df.m, Test = paste0("Test", Test))
    
    reshape2_cast <- function(df) {
        dcast(df.m, ID + Year + Time ~ Test, value.var = "value")
    }
    # min. of three consecutive runs
    system.time(reshape2_cast(df.m))
    #   user  system elapsed 
    # 57.728   9.712  69.573 
    

    data.table timings:

    DT_melt <- function(dt) {
        dt.m <- melt(dt, id.var = c("ID", "Test", "Year"), variable.name = "Time")
    }
    # min. of three consecutive runs
    system.time(dt.m <- reshape2_melt(DT))
    #   user  system elapsed 
    #  0.276   0.001   0.279 
    
    dt.m[, Test := paste0("Test", Test)]
    
    DT_cast <- function(dt) {
        dcast.data.table(dt.m, ID + Year + Time ~ Test, value.var = "value")
    }
    # min. of three consecutive runs
    system.time(DT_cast(dt.m))
    #   user  system elapsed 
    # 12.732   0.825  14.006 
    

    melt.data.table is ~175x faster than reshape2:::melt and dcast.data.table is ~5x than reshape2:::dcast.

    0 讨论(0)
  • 2020-12-14 10:30

    tidyverse/tidyr solution:

    library(dplyr)
    library(tidyr)
    
    df %>% 
      gather("Time", "Value", Fall, Spring, Winter) %>% 
      spread(Test, Value, sep = "")
    
    0 讨论(0)
提交回复
热议问题