tidyverse alternative to left_join & rows_update when two data frames differ in columns and rows

风流意气都作罢 提交于 2021-02-05 08:50:28

问题


There might be a *_join version for this I'm missing here, but I have two data frames, where

  1. The merging should happen in the first data frame, hence left_join
  2. I not only want to add columns, but also update existing columns in the first data frame, more specifically: replace NA's in the first data frame by values in the second data frame
  3. The second data frame contains more rows than the first one.

Condition #1 and #2 make left_join fail. Condition #3 makes rows_update fail. So I need to do some steps in between and am wondering if there's an easier solution to get the desired output.

x <- data.frame(id = c(1, 2, 3),
                a  = c("A", "B", NA))

  id    a
1  1    A
2  2    B
3  3 <NA>

y <- data.frame(id = c(1, 2, 3, 4),
                a  = c("A", "B", "C", "D"),
                q  = c("u", "v", "w", "x"))

  id a q
1  1 A u
2  2 B v
3  3 C w
4  4 D x

and the desired output would be:

  id a q
1  1 A u
2  2 B v
3  3 C w

I know I can achieve this with the following code, but it looks unnecessarily complicated to me. So is there maybe a more direct approach without having to do the intermediate pipes in the two commands below?

library(tidyverse)
x %>%
  left_join(., y %>% select(id, q), by = c("id")) %>%
  rows_update(., y %>% filter(id %in% x$id), by = "id")

回答1:


You can left_join and use coalesce to replace missing values.

library(dplyr)

x %>%
  left_join(y, by = 'id') %>%
  transmute(id, a = coalesce(a.x, a.y), q)

#  id a q
#1  1 A u
#2  2 B v
#3  3 C w


来源:https://stackoverflow.com/questions/65595893/tidyverse-alternative-to-left-join-rows-update-when-two-data-frames-differ-in

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!