Reshape R data with user entries in rows, collapsing for each user

后端未结

关注

 3  902

没有蜡笔的小新

Pardon my new-ness to the R world, thank you kindly in advance for your help.

I would like to analyze the data from an experiment.

The data comes in in Long

相关标签:

3条回答

误落风尘

2020-12-11 04:57

Using data.table you can do:

library(data.table)
> dcast(dt, User_id + location + age ~ Item, value.var = "Resp", fill = 0L)
   User_id location age  A  B  C  D  E  G  H
1:       1       CA  22  1 -1 -1  1 -1  0  0
2:       2       MD  27 -1  1  1  0  1 -1 -1

0 讨论(0)

[愿得一人]

2020-12-11 04:59
There’s a package called tidyr that makes melting and reshaping data frames much easier. In your case, you can use tidyr::spread straightforwardly:
```
result = spread(df, Item, Resp)
```
This will however fill missing entries with NA:
```
  User_id location age gender  A  B  C  D  E  G  H
1       1       CA  22      M  1 -1 -1  1 -1 NA NA
2       2       MD  27      F -1  1  1 NA  1 -1 -1
```
You can fix this by replacing them:
```
result[is.na(result)] = 0
result
#   User_id location age gender  A  B  C  D  E  G  H
# 1       1       CA  22      M  1 -1 -1  1 -1  0  0
# 2       2       MD  27      F -1  1  1  0  1 -1 -1
```
… or by using the fill argument:
```
result = spread(df, Item, Resp, fill = 0)
```
For completeness’ sake, the other way round (i.e. reproducing the original data.frame) works via gather (this is usually known as “melting”):
```
gather(result, Item, Resp, A : H)
```
— The last argument here tells gather which columns to gather (and it supports the concise range syntax).
0 讨论(0)
发布评论:

提交评论
- 加载中...

后悔当初

2020-12-11 05:09

Here's the always elegant stats::reshape version

(newdf <- reshape(df, direction = "wide", timevar = "Item", idvar = names(df)[1:4]))
#   User_id location age gender Resp. A Resp. B Resp. C Resp. D Resp. E Resp. G Resp. H
# 1       1       CA  22      M       1      -1      -1       1      -1      NA      NA
# 6       2       MD  27      F      -1       1       1      NA       1      -1      -1

Missing values get filled with NA in reshape(), and the names are not what we want. So we'll need to do a bit more work. Here we can change the names and replace the NAs with zero in the same line to arrive at your desired result.

replace(setNames(newdf, sub(".* ", "", names(newdf))), is.na(newdf), 0)
#   User_id location age gender  A  B  C D  E  G  H
# 1       1       CA  22      M  1 -1 -1 1 -1  0  0
# 6       2       MD  27      F -1  1  1 0  1 -1 -1

Of course, the code would definitely be more legible if we broke this up into two separate lines. Also, note that there is no F in Item in your original data, hence the difference in output from yours.

Data:

df <- structure(list(User_id = c(1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 
2L, 2L), location = structure(c(1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 
2L, 2L, 2L), .Label = c(" CA", " MD"), class = "factor"), age = c(22L, 
22L, 22L, 22L, 22L, 27L, 27L, 27L, 27L, 27L, 27L), gender = structure(c(2L, 
2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = c(" F", " M"
), class = "factor"), Item = structure(c(1L, 2L, 3L, 4L, 5L, 
1L, 2L, 3L, 5L, 6L, 7L), .Label = c(" A", " B", " C", " D", " E", 
" G", " H"), class = "factor"), Resp = c(1, -1, -1, 1, -1, -1, 
1, 1, 1, -1, -1)), .Names = c("User_id", "location", "age", "gender", 
"Item", "Resp"), class = "data.frame", row.names = c(NA, -11L
))

0 讨论(0)