spread for duplicate identifiers [duplicate]

Deadly 提交于 2020-01-30 11:34:11

问题


I'm really sorry to ask this question again, because there are already many questions about this. However, none of the solutions worked for my problem.

My data looks like this:

id scale rater rating  
1   A      1      5
1   B      1      7
1   A      2      3
1   B      2      6
2   A      1      4
2   B      1      3
2   A      2      2
2   B      2      1

I want to spread(rater, rating)

In the end it should look like this:

id scale   1      2  
1   A      5      3
1   B      7      6
2   A      4      2
2   B      3      1

The problem obviously is that the rows in the first dataset don't have unique identifiers. Looking at answers to similar questions, none of the solutions seem to work for me. I can't just delete duplicate rows and when using row numbers or grouped identifiers group_by(id) %>% mutate (grouped_id = row_number()) I don't get the two raters put in one column, but a row each with NA for the rating of the other rater.

I feel like I tried everything I could find and would really appreciate some help! Thank you very much in advance!


回答1:


We can use the spread function, without having to group_by anything (thanks @Jaap):

library(tidyr)

dat %>%
    spread(rater, rating)

# A tibble: 4 x 4
     id scale   `1`   `2`
  <int> <chr> <int> <int>
1     1 A         5     3
2     1 B         7     6
3     2 A         4     2
4     2 B         3     1

Edit using reshape

Although I would almost never advise using the reshape function over the gather and spread functions, here's how you could do it using base R:

reshape(dat, direction = 'wide',
        idvar = c('id','scale'),
        v.names = 'rating',
        timevar = 'rater')

  id scale rating.1 rating.2
1  1     A        5        3
2  1     B        7        6
5  2     A        4        2
6  2     B        3        1

Data

dat <- structure(list(id = c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L), 
               scale = c("A", "B", "A", "B", "A", "B", "A", "B"), 
               rater = c(1L, 1L, 2L, 2L, 1L, 1L, 2L, 2L), 
               rating = c(5L, 7L, 3L, 6L, 4L, 3L, 2L, 1L)),
          class = "data.frame", row.names = c(NA, -8L))


来源:https://stackoverflow.com/questions/51192050/spread-for-duplicate-identifiers

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!