问题
I know this is a duplicate Q but I can't seem to find the post again
Using the following data
df <- data.frame(A=c(1,1,2,2),B=c(NA,2,NA,4),C=c(3,NA,NA,5),D=c(NA,2,3,NA),E=c(5,NA,NA,4))
A B C D E
1 NA 3 NA 5
1 2 NA 2 NA
2 NA NA 3 NA
2 4 5 NA 4
Grouping by A
, I'd like the following output using a tidyverse
solution
A B C D E
1 2 3 2 5
2 4 5 3 4
I have many groups in A
. I think I saw an answer using coalesce
but am unsure how to get it work. I'd like a solution that works with characters
as well. Thanks!
回答1:
I haven't figured out how to put the coalesce_by_column
function inside the dplyr
pipeline, but this works:
coalesce_by_column <- function(df) {
return(coalesce(df[1], df[2]))
}
df %>%
group_by(A) %>%
summarise_all(coalesce_by_column)
## A B C D E
## <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 1 2 3 2 5
## 2 2 4 5 3 4
Edit: include @Jon Harmon's solution for more than 2 members of a group
# Supply lists by splicing them into dots:
coalesce_by_column <- function(df) {
return(dplyr::coalesce(!!! as.list(df)))
}
df %>%
group_by(A) %>%
summarise_all(coalesce_by_column)
#> # A tibble: 2 x 5
#> A B C D E
#> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 1 2 3 2 5
#> 2 2 4 5 3 4
回答2:
Not tidyverse
but here's one base R solution
df <- data.frame(A=c(1,1),B=c(NA,2),C=c(3,NA),D=c(NA,2),E=c(5,NA))
sapply(df, function(x) x[!is.na(x)][1])
#A B C D E
#1 2 3 2 5
With updated data
do.call(rbind, lapply(split(df, df$A), function(a) sapply(a, function(x) x[!is.na(x)][1])))
# A B C D E
#1 1 2 3 2 5
#2 2 4 5 3 4
回答3:
Here is an even more general solution (using unique
, na.omit
to sort of create coalesce
), which can handle more than two rows with overlapping information. Super simply and forward.
> df <- data.frame(A=c(1,1,2,2,2),B=c(NA,2,NA,4,4),C=c(3,NA,NA,5,NA),D=c(NA,2,3,NA,NA),E=c(5,NA,NA,4,4))
> df
A B C D E
1 1 NA 3 NA 5
2 1 2 NA 2 NA
3 2 NA NA 3 NA
4 2 4 5 NA 4
5 2 4 NA NA 4
> df %>% group_by(A) %>% summarise_all(funs( na.omit(unique(.)) ))
# A tibble: 2 x 5
A B C D E
<dbl> <dbl> <dbl> <dbl> <dbl>
1 1 2 3 2 5
2 2 4 5 3 4
回答4:
We can use fill
to fill all the missing values. And then filter just one row for each group.
library(dplyr)
library(tidyr)
df2 <- df %>%
group_by(A) %>%
fill(everything(), .direction = "down") %>%
fill(everything(), .direction = "up") %>%
slice(1)
回答5:
A different tidyverse
possibility could be:
df %>%
gather(var, val, -A, na.rm = TRUE) %>%
group_by(A, var) %>%
distinct(val) %>%
spread(var, val)
A B C D E
<dbl> <dbl> <dbl> <dbl> <dbl>
1 1 2 3 2 5
2 2 4 5 3 4
Here it, first, performs a wide-to-long data-transformation, excluding the "A" column and removing the missing values. Second, it groups by "A" column and the variable names. Third, it removes the duplicate values. Finally, it returns the data to its original wide format.
来源:https://stackoverflow.com/questions/45515218/combine-rows-in-data-frame-containing-na-to-make-complete-row