Reshape wide format, to multi-column long format

前端未结

关注

 4  1638

I want to reshape a wide format dataset that has multiple tests which are measured at 3 time points:

   ID   Test Year   Fall Spring Winter
    1   1   2008


                      
              相关标签:


      
      
        
          4条回答        

        
                         				            
            
           
            
                              
                
              
              
                
                  眼角桃花        
                
              
                            
                2020-12-14 10:09
              
            
            
                                                                       
Base reshape function alternative method is below. Though this required using reshape twice, there might be a simpler way.

Assuming your dataset is called df1

tmp <- reshape(df1,idvar=c("ID","Year"),timevar="Test",direction="wide")
result <- reshape(
   tmp,
   idvar=c("ID","Year"),
   varying=list(3:5,6:8),
   v.names=c("Test1","Test2"),
   times=c("Fall","Spring","Winter"),
   direction="long"
)


Which gives:

> result
              ID Year   time Test1 Test2
1.2008.Fall    1 2008   Fall    15    22
1.2009.Fall    1 2009   Fall    12    10
2.2008.Fall    2 2008   Fall    12    13
2.2009.Fall    2 2009   Fall    16    23
3.2008.Fall    3 2008   Fall    11    17
3.2009.Fall    3 2009   Fall    13    14
1.2008.Spring  1 2008 Spring    16    22
1.2009.Spring  1 2009 Spring    13    14
2.2008.Spring  2 2008 Spring    13    11
2.2009.Spring  2 2009 Spring    14    20
3.2008.Spring  3 2008 Spring    12    12
3.2009.Spring  3 2009 Spring    11     9
1.2008.Winter  1 2008 Winter    19    24
1.2009.Winter  1 2009 Winter    27    20
2.2008.Winter  2 2008 Winter    25    29
2.2009.Winter  2 2009 Winter    21    26
3.2008.Winter  3 2008 Winter    22    23
3.2009.Winter  3 2009 Winter    27    31

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  不要未来只要你来        
                
              
                            
                2020-12-14 10:18
              
            
            
                                                                       
Sticking with base R, this is another good candidate for the "stack + reshape" routine. Assuming our dataset is called "mydf":

mydf.temp <- data.frame(mydf[1:3], stack(mydf[4:6]))
mydf2 <- reshape(mydf.temp, direction = "wide", 
                 idvar=c("ID", "Year", "ind"), 
                 timevar="Test")
names(mydf2) <- c("ID", "Year", "Time", "Test1", "Test2")
mydf2
#    ID Year   Time Test1 Test2
# 1   1 2008   Fall    15    22
# 2   1 2009   Fall    12    10
# 5   2 2008   Fall    12    13
# 6   2 2009   Fall    16    23
# 9   3 2008   Fall    11    17
# 10  3 2009   Fall    13    14
# 13  1 2008 Spring    16    22
# 14  1 2009 Spring    13    14
# 17  2 2008 Spring    13    11
# 18  2 2009 Spring    14    20
# 21  3 2008 Spring    12    12
# 22  3 2009 Spring    11     9
# 25  1 2008 Winter    19    24
# 26  1 2009 Winter    27    20
# 29  2 2008 Winter    25    29
# 30  2 2009 Winter    21    26
# 33  3 2008 Winter    22    23
# 34  3 2009 Winter    27    31

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  温柔的废话        
                
              
                            
                2020-12-14 10:24
              
            
            
                                                                       
Using reshape2:

# Thanks to Ista for helping with direct naming using "variable.name"
df.m <- melt(df, id.var = c("ID", "Test", "Year"), variable.name = "Time")
df.m <- transform(df.m, Test = paste0("Test", Test))
dcast(df.m, ID + Year + Time ~ Test, value.var = "value")




Update: Using data.table melt/cast from versions >= 1.9.0:

data.table from versions 1.9.0 imports reshape2 package and implements fast melt and dcast methods in C for data.tables. A comparison of speed on bigger data is shown below.

For more info regarding NEWS, go here.

require(data.table) ## ver. >=1.9.0
require(reshape2)

dt <- as.data.table(df, key=c("ID", "Test", "Year"))
dt.m <- melt(dt, id.var = c("ID", "Test", "Year"), variable.name = "Time")
dt.m[, Test := paste0("Test", Test)]
dcast.data.table(dt.m, ID + Year + Time ~ Test, value.var = "value")


At the moment, you'll have to write dcast.data.table explicitly as it's not a S3 generic in reshape2 yet.



Benchmarking on bigger data:

# generate data:
set.seed(45L)
DT <- data.table(ID = sample(1e2, 1e7, TRUE), 
        Test = sample(1e3, 1e7, TRUE), 
        Year = sample(2008:2014, 1e7,TRUE), 
        Fall = sample(50, 1e7, TRUE), 
        Spring = sample(50, 1e7,TRUE), 
        Winter = sample(50, 1e7, TRUE))
DF <- as.data.frame(DT)


reshape2 timings:

reshape2_melt <- function(df) {
    df.m <- melt(df, id.var = c("ID", "Test", "Year"), variable.name = "Time")
}
# min. of three consecutive runs
system.time(df.m <- reshape2_melt(DF))
#   user  system elapsed 
# 43.319   4.909  48.932 

df.m <- transform(df.m, Test = paste0("Test", Test))

reshape2_cast <- function(df) {
    dcast(df.m, ID + Year + Time ~ Test, value.var = "value")
}
# min. of three consecutive runs
system.time(reshape2_cast(df.m))
#   user  system elapsed 
# 57.728   9.712  69.573 


data.table timings:

DT_melt <- function(dt) {
    dt.m <- melt(dt, id.var = c("ID", "Test", "Year"), variable.name = "Time")
}
# min. of three consecutive runs
system.time(dt.m <- reshape2_melt(DT))
#   user  system elapsed 
#  0.276   0.001   0.279 

dt.m[, Test := paste0("Test", Test)]

DT_cast <- function(dt) {
    dcast.data.table(dt.m, ID + Year + Time ~ Test, value.var = "value")
}
# min. of three consecutive runs
system.time(DT_cast(dt.m))
#   user  system elapsed 
# 12.732   0.825  14.006 


melt.data.table is ~175x faster than reshape2:::melt and dcast.data.table is ~5x than reshape2:::dcast.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  清酒与你        
                
              
                            
                2020-12-14 10:30
              
            
            
                                                                       
tidyverse/tidyr solution:

library(dplyr)
library(tidyr)

df %>% 
  gather("Time", "Value", Fall, Spring, Winter) %>% 
  spread(Test, Value, sep = "")

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
                             
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复