R collapse multiple rows into 1 row - same columns

后端未结

关注

 3  499

This is piggy backing on a question I answered last night as I am reconsidering how I\'d like to format my data. I did search but couldn\'t find up with any applicable answe

相关标签:

3条回答

我在风中等你

2020-12-18 07:38
You can reshape to long format, drop the blank entries and then go back to wide:
```
res <- dcast(melt(df, id.vars = "record_numb")[ value != "" ], record_numb ~ variable)

   record_numb col_a col_b col_c
1:           1   123   234   543
2:           2   987   765   543
```
You may find it more readable at first using magrittr:
```
library(magrittr)
res = df %>% 
  melt(id.vars = "record_numb") %>% 
  .[ value != "" ] %>% 
  dcast(record_numb ~ variable)
```
The numbers are still formatted as strings, but you can convert them with...
```
cols = setdiff(names(res), "record_numb")
res[, (cols) := lapply(.SD, type.convert), .SDcols = cols]
```
Type conversion will change each column to whatever class it looks like it should be (numeric, integer, whatever). See ?type.convert.
0 讨论(0)
发布评论:

提交评论
- 加载中...

逝去的感伤

2020-12-18 07:40

Just do this :

df = df %>% group_by(record_numb) %>%
    summarise(col_a = sum(col_a, na.rm = T),
    col_b = sum(col_b, na.rm = T), 
    col_c = sum(col_c, na.rm = T))

.... inplace of 'sum' you could use min, max or whatever.

0 讨论(0)

青春惊慌失措

2020-12-18 07:55
As you suggested that you would like a data.table solution in your comment, you could use
```
library(data.table)
df <- data.table(record_numb,col_a,col_b,col_c)

df[, lapply(.SD, paste0, collapse=""), by=record_numb]
   record_numb col_a col_b col_c
1:           1   123   234   543
2:           2   987   765   543
```
.SD basically says, "take all the variables in my data.table" except those in the by argument. In @Frank's answer, he reduces the set of the variables using .SDcols. If you want to cast the variables into numeric, you can still do this in one line. Here is a chaining method.
```
df[, lapply(.SD, paste0, collapse=""), by=record_numb][, lapply(.SD, as.integer)]
```
The second "chain" casts all the variables as integers.
0 讨论(0)
发布评论:

提交评论
- 加载中...