dplyr join define NA values

烈酒焚心 提交于 2019-12-18 12:54:34

问题


Can I define a "fill" value for NA in dplyr join? For example in the join define that all NA values should be 1?

require(dplyr)
lookup <- data.frame(cbind(c("USD","MYR"),c(0.9,1.1)))
names(lookup) <- c("rate","value")
fx <- data.frame(c("USD","MYR","USD","MYR","XXX","YYY"))
names(fx)[1] <- "rate"
left_join(x=fx,y=lookup,by=c("rate"))

Above code will create NA for values "XXX" and "YYY". In my case I am joining a large number of columns and there will be a lot of non-matches. All non-matches should have the same value. I know I can do it in several steps but the question is can all be done in one? Thanks!


回答1:


First off, I would like to recommend not to use the combination data.frame(cbind(...)). Here's why: cbind creates a matrix by default if you only pass atomic vectors to it. And matrices in R can only have one type of data (think of matrices as a vector with dimension attribute, i.e. number of rows and columns). Therefore, your code

cbind(c("USD","MYR"),c(0.9,1.1))

creates a character matrix:

str(cbind(c("USD","MYR"),c(0.9,1.1)))
# chr [1:2, 1:2] "USD" "MYR" "0.9" "1.1"

although you probably expected a final data frame with a character or factor column (rate) and a numeric column (value). But what you get is:

str(data.frame(cbind(c("USD","MYR"),c(0.9,1.1))))
#'data.frame':  2 obs. of  2 variables:
# $ X1: Factor w/ 2 levels "MYR","USD": 2 1
# $ X2: Factor w/ 2 levels "0.9","1.1": 1 2

because strings (characters) are converted to factors when using data.frame by default (You can circumvent this by specifying stringsAsFactors = FALSE in the data.frame() call).

I suggest the following alternative approach to create the sample data (also note that you can easily specify the column names in the same call):

lookup <- data.frame(rate = c("USD","MYR"), 
                     value = c(0.9,1.1))

fx <- data.frame(rate = c("USD","MYR","USD","MYR","XXX","YYY"))

Now, for you actual question, if I understand correctly, you want to replace all NAs with a 1 in the joined data. If that's correct, here's a custom function using left_join and mutate_each to do that:

library(dplyr)
left_join_NA <- function(x, y, ...) {
  left_join(x = x, y = y, by = ...) %>% 
    mutate_each(funs(replace(., which(is.na(.)), 1)))
}

Now you can apply it to your data like this:

> left_join_NA(x = fx, y = lookup, by = "rate")
#  rate value
#1  USD   0.9
#2  MYR   1.1
#3  USD   0.9
#4  MYR   1.1
#5  XXX   1.0
#6  YYY   1.0
#Warning message:
#joining factors with different levels, coercing to character vector 

Note that you end up with a character column (rate) and a numeric column (value) and all NAs are replaced by 1.

str(left_join_NA(x = fx, y = lookup, by = "rate"))
#'data.frame':  6 obs. of  2 variables:
# $ rate : chr  "USD" "MYR" "USD" "MYR" ...
# $ value: num  0.9 1.1 0.9 1.1 1 1



回答2:


If you're using dplyr anyway, you might as well take advantage of dplyr::coalesce, and use the dplyr syntax to pass into that a 1 or 0. I think this looks nice...

... %>%
mutate_if(is.numeric,coalesce,0)

Where the 0 is the arg passed to dplyr::coalesce to replace NAs.

In the example in the question, there are dataframes with factors. I feel confident one would not have FX rates as factors, or another vector in which you'd replace NA with zero, so I go ahead and add that step below just to make the answer executable after the provided example.

# replace NAs with zeros for all numeric columns
#
# ... code from question above
left_join(x=fx,y=lookup,by=c("rate")) %>%
    # ignore if factors in value column are because it's a toy example
    mutate(value = as.numeric(as.character(value))) %>%
    # the good stuff here
    mutate_if(is.numeric,coalesce,0)



回答3:


I stumbled on the same problem with dplyr and wrote a small function that solved my problem. (the solution requires tidyr and dplyr)

left_join0 <- function(x, y, fill = 0L){
  z <- left_join(x, y)
  tmp <- setdiff(names(z), names(x))
  z <- replace_na(z, setNames(as.list(rep(fill,   length(tmp))), tmp))
  z
}

Originally answered at: R Left Outer Join with 0 Fill Instead of NA While Preserving Valid NA's in Left Table



来源:https://stackoverflow.com/questions/28992362/dplyr-join-define-na-values

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!