How to remove rows with inf from a dataframe in R

后端 未结 6 610
自闭症患者
自闭症患者 2020-12-29 03:50

I have a very large dataframe(df) with approximately 35-45 columns(variables) and rows greater than 300. Some of the rows contains NA,NaN,Inf,-Inf values in

相关标签:
6条回答
  • 2020-12-29 04:09

    To remove the rows with +/-Inf I'd suggest the following:

    df <- df[!is.infinite(rowSums(df)),]
    

    or, equivalently,

    df <- df[is.finite(rowSums(df)),]
    

    The second option (the one with is.finite() and without the negation) removes also rows containing NA values in case that this has not already been done.

    0 讨论(0)
  • 2020-12-29 04:10

    I had this problem and none of the above solutions worked for me. I used the following to remove rows with +/-Inf in columns 15 and 16 of my dataframe.

    d<-subset(c, c[,15:16]!="-Inf") 
    e<-subset(d, d[,15:16]!="Inf")
    
    0 讨论(0)
  • 2020-12-29 04:13

    It took me awhile to work this out for dplyr 1.0.0 so i thought i would put up the new version of @sbha solutions using c_across since filter_all, filter_if are getting deprecated.

    library(dplyr)
    df <- tibble(a = c(1, 2, 3, NA), b = c(5, Inf, 8, 8), c = c(9, 10, Inf, 11), d = c('a', 'b', 'c', 'd'))
    #       a     b     c d    
    #   <dbl> <dbl> <dbl> <chr>
    # 1     1     5     9 a    
    # 2     2   Inf    10 b    
    # 3     3     8   Inf c    
    # 4    NA     8    11 d 
    
    df %>% 
      rowwise %>% 
      filter(!all(is.infinite(c_across(where(is.numeric)))))
    # # A tibble: 4 x 4
    # # Rowwise: 
    #       a     b     c d    
    #   <dbl> <dbl> <dbl> <chr>
    # 1     1     5     9 a    
    # 2     2   Inf    10 b    
    # 3     3     8   Inf c    
    # 4    NA     8    11 d 
    
    df %>% 
      rowwise %>% 
      filter(!any(is.infinite(c_across(where(is.numeric)))))
    # # A tibble: 2 x 4
    # # Rowwise: 
    #       a     b     c d    
    #   <dbl> <dbl> <dbl> <chr>
    # 1     1     5     9 a    
    # 2    NA     8    11 d 
    
    df %>% 
      rowwise %>% 
      filter(!any(is.infinite(c_across(a:c))))
    
    # # A tibble: 2 x 4
    # # Rowwise: 
    #       a     b     c d    
    #   <dbl> <dbl> <dbl> <chr>
    # 1     1     5     9 a    
    # 2    NA     8    11 d 
    

    To be honest I think @sbha answer is simpler!

    0 讨论(0)
  • 2020-12-29 04:28

    To keep the rows without Inf we can do:

    df[apply(df, 1, function(x) all(is.finite(x))), ]
    

    Also NAs are handled by this because of:
    a rowindex with value NA will remove this row in the result.

    Also rows with NaN are not in the result.

    set.seed(24)
    df <- as.data.frame(matrix(sample(c(0:9, NA, -Inf, Inf, NaN),  20*5, replace=TRUE), ncol=5))
    df2 <- df[apply(df, 1, function(x) all(is.finite(x))), ]
    

    Here are the results of the different is.~-functions:

    x <- c(42, NA, NaN, Inf)
    is.finite(x)
    # [1]  TRUE FALSE FALSE FALSE
    is.na(x)
    # [1] FALSE  TRUE  TRUE FALSE
    is.nan(x)
    # [1] FALSE FALSE  TRUE FALSE
    
    0 讨论(0)
  • 2020-12-29 04:35

    Depending on the data, there are a couple options using scoped variants of dplyr::filter() and is.finite() or is.infinite() that might be useful:

    library(dplyr)
    
    # sample data
    df <- data_frame(a = c(1, 2, 3, NA), b = c(5, Inf, 8, 8), c = c(9, 10, Inf, 11), d = c('a', 'b', 'c', 'd'))
    
    # across all columns:
    df %>% 
      filter_all(all_vars(!is.infinite(.)))
    
    # note that is.finite() does not work with NA or strings:
    df %>% 
      filter_all(all_vars(is.finite(.)))
    
    # checking only numeric columns:
    df %>% 
      filter_if(~is.numeric(.), all_vars(!is.infinite(.)))
    
    # checking only select columns, in this case a through c:
    df %>% 
      filter_at(vars(a:c), all_vars(!is.infinite(.)))
    
    0 讨论(0)
  • 2020-12-29 04:35

    The is.finite works on vector and not on data.frame object. So, we can loop through the data.frame using lapply and get only the 'finite' values.

    lapply(df, function(x) x[is.finite(x)])
    

    If the number of Inf, -Inf values are different for each column, the above code will have a list with elements having unequal length. So, it may be better to leave it as a list. If we want a data.frame, it should have equal lengths.


    If we want to remove rows contain any NA or Inf/-Inf values

    df[Reduce(`&`, lapply(df, function(x) !is.na(x)  & is.finite(x))),]
    

    Or a compact option by @nicola

    df[Reduce(`&`, lapply(df, is.finite)),]
    

    If we are ready to use a package, a compact option would be NaRV.omit

    library(IDPmisc)
    NaRV.omit(df)
    

    data

    set.seed(24)
    df <- as.data.frame(matrix(sample(c(1:5, NA, -Inf, Inf), 
                          20*5, replace=TRUE), ncol=5))
    
    0 讨论(0)
提交回复
热议问题