R collapse multiple rows into 1 row - same columns

后端 未结 3 499
清歌不尽
清歌不尽 2020-12-18 07:18

This is piggy backing on a question I answered last night as I am reconsidering how I\'d like to format my data. I did search but couldn\'t find up with any applicable answe

相关标签:
3条回答
  • 2020-12-18 07:38

    You can reshape to long format, drop the blank entries and then go back to wide:

    res <- dcast(melt(df, id.vars = "record_numb")[ value != "" ], record_numb ~ variable)
    
       record_numb col_a col_b col_c
    1:           1   123   234   543
    2:           2   987   765   543
    

    You may find it more readable at first using magrittr:

    library(magrittr)
    res = df %>% 
      melt(id.vars = "record_numb") %>% 
      .[ value != "" ] %>% 
      dcast(record_numb ~ variable)
    

    The numbers are still formatted as strings, but you can convert them with...

    cols = setdiff(names(res), "record_numb")
    res[, (cols) := lapply(.SD, type.convert), .SDcols = cols]
    

    Type conversion will change each column to whatever class it looks like it should be (numeric, integer, whatever). See ?type.convert.

    0 讨论(0)
  • 2020-12-18 07:40

    Just do this :

    df = df %>% group_by(record_numb) %>%
        summarise(col_a = sum(col_a, na.rm = T),
        col_b = sum(col_b, na.rm = T), 
        col_c = sum(col_c, na.rm = T))
    

    .... inplace of 'sum' you could use min, max or whatever.

    0 讨论(0)
  • 2020-12-18 07:55

    As you suggested that you would like a data.table solution in your comment, you could use

    library(data.table)
    df <- data.table(record_numb,col_a,col_b,col_c)
    
    df[, lapply(.SD, paste0, collapse=""), by=record_numb]
       record_numb col_a col_b col_c
    1:           1   123   234   543
    2:           2   987   765   543
    

    .SD basically says, "take all the variables in my data.table" except those in the by argument. In @Frank's answer, he reduces the set of the variables using .SDcols. If you want to cast the variables into numeric, you can still do this in one line. Here is a chaining method.

    df[, lapply(.SD, paste0, collapse=""), by=record_numb][, lapply(.SD, as.integer)]
    

    The second "chain" casts all the variables as integers.

    0 讨论(0)
提交回复
热议问题