suppress NAs in paste()

删除回忆录丶 提交于 2019-11-26 14:37:26

For the purpose of a "true-NA": Seems the most direct route is just to modify the value returned by paste2 to be NA when the value is ""

 paste3 <- function(...,sep=", ") {
     L <- list(...)
     L <- lapply(L,function(x) {x[is.na(x)] <- ""; x})
     ret <-gsub(paste0("(^",sep,"|",sep,"$)"),"",
                 gsub(paste0(sep,sep),sep,
                      do.call(paste,c(L,list(sep=sep)))))
     is.na(ret) <- ret==""
     ret
     }
 val<- paste3(c("a","b", "c", NA), c("A","B", NA, NA))
 val
#[1] "a, A" "b, B" "c"    NA    

I know this question is many years old, but it's still the top google result for r paste na. I was looking for a quick solution to what I assumed was a simple problem, and was somewhat taken aback by the complexity of the answers. I opted for a different solution, and am posting it here in case anyone else is interested.

bar <- apply(cbind(1:4, foo), 1, function(x) paste(x[!is.na(x)], collapse = ", "))
bar
[1] "1, A" "2, B" "3, C" "4"

In case it isn't obvious, this will work on any number of vecotrs with NAs in any positions.

IMHO, the advantage of this over the existing answers is legibility. It's a one-liner, which is always nice, and it doesn't rely on a bunch of regexes and if/else statements which may trip up your colleagues or future self. Erik Shitts' answer mostly shares these advantages, but assumes there are only two vectors and that only the last of them contains NAs.

My solution doesn't satisfy the requirement in your edit, because my project has the opposite requirement. However, you can easily solve this by adding a second line borrowed from 42-'s answer:

is.na(bar) <- bar == ""
Ben Bolker

A function that follows up on @ErikShilt's answer and @agstudy's comment. It generalizes the situation slightly by allowing sep to be specified and handling cases where any element (first, last, or intermediate) is NA. (It might break if there are multiple NA values in a row, or in other tricky cases ...) By the way, note that this situation is described exactly in the second paragraph of the Details section of ?paste, which indicates that at least the R authors are aware of the situation (although no solution is offered).

paste2 <- function(...,sep=", ") {
    L <- list(...)
    L <- lapply(L,function(x) {x[is.na(x)] <- ""; x})
    gsub(paste0("(^",sep,"|",sep,"$)"),"",
                gsub(paste0(sep,sep),sep,
                     do.call(paste,c(L,list(sep=sep)))))
}
foo <- c(LETTERS[1:3],NA)
bar <- c(NA,2:4)
baz <- c("a",NA,"c","d")
paste2(foo,bar,baz)
# [1] "A, a"    "B, 2"    "C, 3, c" "4, d"   

This doesn't handle @agstudy's suggestions of (1) incorporating the optional collapse argument; (2) making NA-removal optional by adding an na.rm argument (and setting the default to FALSE to make paste2 backward compatible with paste). If one wanted to make this more sophisticated (i.e. remove multiple sequential NAs) or faster it might make sense to write it in C++ via Rcpp (I don't know much about C++'s string-handling, but it might not be too hard -- see convert Rcpp::CharacterVector to std::string and Concatenating strings doesn't work as expected for a start ...)

JWilliman

As Ben Bolker mentioned the above approaches may fall over if there are multiple NAs in a row. I tried a different approach that seems to overcome this.

paste4 <- function(x, sep = ", ") {
  x <- gsub("^\\s+|\\s+$", "", x) 
  ret <- paste(x[!is.na(x) & !(x %in% "")], collapse = sep)
  is.na(ret) <- ret == ""
  return(ret)
  }

The second line strips out extra whitespace introduced when concatenating text and numbers. The above code can be used to concatenate multiple columns (or rows) of a dataframe using the apply command, or repackaged to first coerce the data into a dataframe if needed.

EDIT

After a few more hours thought I think the following code incorporates the suggestions above to allow specification of the collapse and na.rm options.

paste5 <- function(..., sep = " ", collapse = NULL, na.rm = F) {
  if (na.rm == F)
    paste(..., sep = sep, collapse = collapse)
  else
    if (na.rm == T) {
      paste.na <- function(x, sep) {
        x <- gsub("^\\s+|\\s+$", "", x)
        ret <- paste(na.omit(x), collapse = sep)
        is.na(ret) <- ret == ""
        return(ret)
      }
      df <- data.frame(..., stringsAsFactors = F)
      ret <- apply(df, 1, FUN = function(x) paste.na(x, sep))

      if (is.null(collapse))
        ret
      else {
        paste.na(ret, sep = collapse)
      }
    }
}

As above, na.omit(x) can be replaced with (x[!is.na(x) & !(x %in% "") to also drop empty strings if desired. Note, using collapse with na.rm = T returns a string without any "NA", though this could be changed by replacing the last line of code with paste(ret, collapse = collapse).

nth <- paste0(1:12, c("st", "nd", "rd", rep("th", 9)))
mnth <- month.abb
nth[4:5] <- NA
mnth[5:6] <- NA

paste5(mnth, nth)
[1] "Jan 1st"  "Feb 2nd"  "Mar 3rd"  "Apr NA"   "NA NA"    "NA 6th"   "Jul 7th"  "Aug 8th"  "Sep 9th"  "Oct 10th" "Nov 11th" "Dec 12th"

paste5(mnth, nth, sep = ": ", collapse = "; ", na.rm = T)
[1] "Jan: 1st; Feb: 2nd; Mar: 3rd; Apr; 6th; Jul: 7th; Aug: 8th; Sep: 9th; Oct: 10th; Nov: 11th; Dec: 12th"

paste3(c("a","b", "c", NA), c("A","B", NA, NA), c(1,2,NA,4), c(5,6,7,8))
[1] "a, A, 1, 5" "b, B, 2, 6" "c, , 7"     "4, 8" 

paste5(c("a","b", "c", NA), c("A","B", NA, NA), c(1,2,NA,4), c(5,6,7,8), sep = ", ", na.rm = T)
[1] "a, A, 1, 5" "b, B, 2, 6" "c, 7"       "4, 8" 

You can use ifelse, a vectorized if-else construct to determine if a value is NA and substitute a blank. You'll then use gsub to strip out the trailing ", " if it isn't followed by any other string.

gsub(", $", "", paste(1:4, ifelse(is.na(foo), "", foo), sep = ", "))

Your answer is correct. There isn't a better way to do it. This issue is explicitly mentioned in the paste documentation in the Details section.

Or do a mutate after paste() and remove NAs:

data <- data.frame(col1= c(rep(NA, 5)), col2 = c(2:6)) %>%
  mutate(col3 = paste(col1, col2)) %>%
  mutate(col3 = gsub('NA', '', col3))

Or remove the NAs after paste with str_replace_all

data$1 <- str_replace_all(data$1, "NA", "")
标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!