Reshape R data with user entries in rows, collapsing for each user

后端 未结 3 902
没有蜡笔的小新
没有蜡笔的小新 2020-12-11 04:27

Pardon my new-ness to the R world, thank you kindly in advance for your help.

I would like to analyze the data from an experiment.

The data comes in in Long

相关标签:
3条回答
  • 2020-12-11 04:57

    Using data.table you can do:

    library(data.table)
    > dcast(dt, User_id + location + age ~ Item, value.var = "Resp", fill = 0L)
       User_id location age  A  B  C  D  E  G  H
    1:       1       CA  22  1 -1 -1  1 -1  0  0
    2:       2       MD  27 -1  1  1  0  1 -1 -1
    
    0 讨论(0)
  • 2020-12-11 04:59

    There’s a package called tidyr that makes melting and reshaping data frames much easier. In your case, you can use tidyr::spread straightforwardly:

    result = spread(df, Item, Resp)
    

    This will however fill missing entries with NA:

      User_id location age gender  A  B  C  D  E  G  H
    1       1       CA  22      M  1 -1 -1  1 -1 NA NA
    2       2       MD  27      F -1  1  1 NA  1 -1 -1
    

    You can fix this by replacing them:

    result[is.na(result)] = 0
    result
    #   User_id location age gender  A  B  C  D  E  G  H
    # 1       1       CA  22      M  1 -1 -1  1 -1  0  0
    # 2       2       MD  27      F -1  1  1  0  1 -1 -1
    

    … or by using the fill argument:

    result = spread(df, Item, Resp, fill = 0)
    

    For completeness’ sake, the other way round (i.e. reproducing the original data.frame) works via gather (this is usually known as “melting”):

    gather(result, Item, Resp, A : H)
    

    — The last argument here tells gather which columns to gather (and it supports the concise range syntax).

    0 讨论(0)
  • 2020-12-11 05:09

    Here's the always elegant stats::reshape version

    (newdf <- reshape(df, direction = "wide", timevar = "Item", idvar = names(df)[1:4]))
    #   User_id location age gender Resp. A Resp. B Resp. C Resp. D Resp. E Resp. G Resp. H
    # 1       1       CA  22      M       1      -1      -1       1      -1      NA      NA
    # 6       2       MD  27      F      -1       1       1      NA       1      -1      -1
    

    Missing values get filled with NA in reshape(), and the names are not what we want. So we'll need to do a bit more work. Here we can change the names and replace the NAs with zero in the same line to arrive at your desired result.

    replace(setNames(newdf, sub(".* ", "", names(newdf))), is.na(newdf), 0)
    #   User_id location age gender  A  B  C D  E  G  H
    # 1       1       CA  22      M  1 -1 -1 1 -1  0  0
    # 6       2       MD  27      F -1  1  1 0  1 -1 -1
    

    Of course, the code would definitely be more legible if we broke this up into two separate lines. Also, note that there is no F in Item in your original data, hence the difference in output from yours.

    Data:

    df <- structure(list(User_id = c(1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 
    2L, 2L), location = structure(c(1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 
    2L, 2L, 2L), .Label = c(" CA", " MD"), class = "factor"), age = c(22L, 
    22L, 22L, 22L, 22L, 27L, 27L, 27L, 27L, 27L, 27L), gender = structure(c(2L, 
    2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = c(" F", " M"
    ), class = "factor"), Item = structure(c(1L, 2L, 3L, 4L, 5L, 
    1L, 2L, 3L, 5L, 6L, 7L), .Label = c(" A", " B", " C", " D", " E", 
    " G", " H"), class = "factor"), Resp = c(1, -1, -1, 1, -1, -1, 
    1, 1, 1, -1, -1)), .Names = c("User_id", "location", "age", "gender", 
    "Item", "Resp"), class = "data.frame", row.names = c(NA, -11L
    ))
    
    0 讨论(0)
提交回复
热议问题