How to use the spread function properly in tidyr

前端 未结 2 1067
死守一世寂寞
死守一世寂寞 2020-12-10 17:01

How do I change the following table from:

Type    Name    Answer     n
TypeA   Apple   Yes        5
TypeA   Apple   No        10
TypeA   Apple   DK         8         


        
相关标签:
2条回答
  • 2020-12-10 17:43

    Following on the comment from ayk, I'm providing an example. It looks to me like when you have a data_frame with a column of either a factor or character class that has values of NA, this cannot be spread without either removing them or re-classifying the data. This is specific to a data_frame (note the dplyr class with the underscore in the name), as this works in my example when you have values of NA in a data.frame. For example, a slightly modified version of the example above:

    Here is the dataframe

    library(dplyr)
    library(tidyr)
    df_1 <- data_frame(Type = c("TypeA", "TypeA", "TypeB", "TypeB"),
                       Answer = c("Yes", "No", NA, "No"),
                       n = 1:4)
    df_1
    

    Which gives a data_frame that looks like this

    Source: local data frame [4 x 3]
    
       Type Answer     n
      (chr)  (chr) (int)
    1 TypeA    Yes     1
    2 TypeA     No     2
    3 TypeB     NA     3
    4 TypeB     No     4
    

    Then, when we try to tidy it, we get an error message:

    df_1 %>% spread(key=Answer, value=n)
    Error: All columns must be named
    

    But if we remove the NA's then it 'works':

    df_1 %>%
        filter(!is.na(Answer)) %>%
        spread(key=Answer, value=n)
    Source: local data frame [2 x 3]
    
       Type    No   Yes
      (chr) (int) (int)
    1 TypeA     2     1
    2 TypeB     4    NA
    

    However, removing the NAs may not give you the desired result: i.e. you might want those to be included in your tidied table. You could modify the data directly to change the NAs to a more descriptive value. Alternatively, you could change your data to a data.frame and then it spreads just fine:

    as.data.frame(df_1) %>% spread(key=Answer, value=n)
       Type No Yes NA
    1 TypeA  2   1 NA
    2 TypeB  4  NA  3
    
    0 讨论(0)
  • 2020-12-10 17:50

    I think only tidyr is needed to get from df_1 to df_2.

    library(magrittr)
    df_1 <- read.csv(text="Type,Name,Answer,n\nTypeA,Apple,Yes,5\nTypeA,Apple,No,10\nTypeA,Apple,DK,8\nTypeA,Apple,NA,20\nTypeA,Orange,Yes,6\nTypeA,Orange,No,11\nTypeA,Orange,DK,8\nTypeA,Orange,NA,23", stringsAsFactors=F)
    
    df_2 <- df_1 %>% 
      tidyr::spread(key=Answer, value=n)
    

    Output:

       Type   Name DK No Yes NA
    1 TypeA  Apple  8 10   5 20
    2 TypeA Orange  8 11   6 23
    
    0 讨论(0)
提交回复
热议问题