Pivoting rows into columns

后端 未结 4 1619
我寻月下人不归
我寻月下人不归 2021-02-05 15:57

Suppose (to simplify) I have a table containing some control vs. treatment data:

Which, Color, Response, Count
Control, Red, 2, 10
Control, Blue, 3, 20
Treatment         


        
4条回答
  •  無奈伤痛
    2021-02-05 16:15

    To add to the options (many years later)....

    The typical approach in base R would involve the reshape function (which is generally unpopular because of the multitude of arguments that take time to master). It's a pretty efficient function for smaller datasets, but doesn't always scale well.

    reshape(mydf, direction = "wide", idvar = "Color", timevar = "Which")
    #   Color Response.Control Count.Control Response.Treatment Count.Treatment
    # 1   Red                2            10                  1              14
    # 2  Blue                3            20                  4              21
    

    Already covered are cast/dcast from the "reshape" and "reshape2" (and now, dcast.data.table from "data.table", especially useful when you have large datasets). But also from the Hadleyverse, there's "tidyr", which works nicely with the "dplyr" package:

    library(tidyr)
    library(dplyr)
    mydf %>%
      gather(var, val, Response:Count) %>%  ## make a long dataframe
      unite(RN, var, Which) %>%             ## combine the var and Which columns
      spread(RN, val)                       ## make the results wide
    #   Color Count_Control Count_Treatment Response_Control Response_Treatment
    # 1  Blue            20              21                3                  4
    # 2   Red            10              14                2                  1
    

    Also to note would be that in a forthcoming version of "data.table", the dcast.data.table function should be able to handle this without having to first melt your data.

    The data.table implementation of dcast allows you to convert multiple columns to a wide format without melting it first, as follows:

    library(data.table)
    dcast(as.data.table(mydf), Color ~ Which, value.var = c("Response", "Count"))
    #    Color Response_Control Response_Treatment Count_Control Count_Treatment
    # 1:  Blue                3                  4            20              21
    # 2:   Red                2                  1            10              14
    

提交回复
热议问题