How to replace NAs of a variable with values from another dataframe

拟墨画扇 提交于 2019-12-02 08:15:55

Here's a quick solution using data.tables binary join this will join only gender with sex and leave all the rest of the columns untouched

library(data.table)
setkey(setDT(df1), ID)
df1[df2, gender := i.sex][]
#     ID gender
#  1:  1      2
#  2:  2      2
#  3:  3      1
#  4:  4      2
#  5:  5      2
#  6:  6      2
#  7:  7      2
#  8:  8      2
#  9:  9      2
# 10: 10      2
# 11: 11      2
# 12: 12      2
# 13: 13      1
# 14: 14      1
# 15: 15      2
# 16: 16      2
# 17: 17      2
# 18: 18      2
# 19: 19      2
# 20: 20      2
# 21: 21      1
# 22: 22      2
# 23: 23      2
# 24: 24      2
# 25: 25      2
# 26: 26      2
# 27: 27      2
# 28: 28      2
# 29: 29      2
# 30: 30      2

This would probably be the simplest with base R.

idx <- is.na(df1$gender)
df1$gender[idx] = df2$sex[idx]

You could do

df1 %>% select(ID) %>% left_join(df2, by = "ID")
#   ID sex
#1   1   2
#2   2   2
#3   3   1
#4   4   2
#5   5   2
#6   6   2
#.. ..  

This assumes - as in the example - that all ID's from df1 are also present in df2 and have a sex/gender information there.


If you have other columns in your data you could also try this instead:

df1 %>% select(-gender) %>% left_join(df2[c("ID", "sex")], by = "ID")
标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!