I have a very large dataframe(df) with approximately 35-45 columns(variables) and rows greater than 300. Some of the rows contains NA,NaN,Inf,-Inf values in
To remove the rows with +/-Inf
I'd suggest the following:
df <- df[!is.infinite(rowSums(df)),]
or, equivalently,
df <- df[is.finite(rowSums(df)),]
The second option (the one with is.finite()
and without the negation) removes also rows containing NA
values in case that this has not already been done.
I had this problem and none of the above solutions worked for me. I used the following to remove rows with +/-Inf in columns 15 and 16 of my dataframe.
d<-subset(c, c[,15:16]!="-Inf")
e<-subset(d, d[,15:16]!="Inf")
It took me awhile to work this out for dplyr 1.0.0 so i thought i would put up the new version of @sbha solutions using c_across
since filter_all
, filter_if
are getting deprecated.
library(dplyr)
df <- tibble(a = c(1, 2, 3, NA), b = c(5, Inf, 8, 8), c = c(9, 10, Inf, 11), d = c('a', 'b', 'c', 'd'))
# a b c d
# <dbl> <dbl> <dbl> <chr>
# 1 1 5 9 a
# 2 2 Inf 10 b
# 3 3 8 Inf c
# 4 NA 8 11 d
df %>%
rowwise %>%
filter(!all(is.infinite(c_across(where(is.numeric)))))
# # A tibble: 4 x 4
# # Rowwise:
# a b c d
# <dbl> <dbl> <dbl> <chr>
# 1 1 5 9 a
# 2 2 Inf 10 b
# 3 3 8 Inf c
# 4 NA 8 11 d
df %>%
rowwise %>%
filter(!any(is.infinite(c_across(where(is.numeric)))))
# # A tibble: 2 x 4
# # Rowwise:
# a b c d
# <dbl> <dbl> <dbl> <chr>
# 1 1 5 9 a
# 2 NA 8 11 d
df %>%
rowwise %>%
filter(!any(is.infinite(c_across(a:c))))
# # A tibble: 2 x 4
# # Rowwise:
# a b c d
# <dbl> <dbl> <dbl> <chr>
# 1 1 5 9 a
# 2 NA 8 11 d
To be honest I think @sbha answer is simpler!
To keep the rows without Inf
we can do:
df[apply(df, 1, function(x) all(is.finite(x))), ]
Also NA
s are handled by this because of:
a rowindex with value NA
will remove this row in the result.
Also rows with NaN
are not in the result.
set.seed(24)
df <- as.data.frame(matrix(sample(c(0:9, NA, -Inf, Inf, NaN), 20*5, replace=TRUE), ncol=5))
df2 <- df[apply(df, 1, function(x) all(is.finite(x))), ]
Here are the results of the different is.~
-functions:
x <- c(42, NA, NaN, Inf)
is.finite(x)
# [1] TRUE FALSE FALSE FALSE
is.na(x)
# [1] FALSE TRUE TRUE FALSE
is.nan(x)
# [1] FALSE FALSE TRUE FALSE
Depending on the data, there are a couple options using scoped variants of dplyr::filter()
and is.finite()
or is.infinite()
that might be useful:
library(dplyr)
# sample data
df <- data_frame(a = c(1, 2, 3, NA), b = c(5, Inf, 8, 8), c = c(9, 10, Inf, 11), d = c('a', 'b', 'c', 'd'))
# across all columns:
df %>%
filter_all(all_vars(!is.infinite(.)))
# note that is.finite() does not work with NA or strings:
df %>%
filter_all(all_vars(is.finite(.)))
# checking only numeric columns:
df %>%
filter_if(~is.numeric(.), all_vars(!is.infinite(.)))
# checking only select columns, in this case a through c:
df %>%
filter_at(vars(a:c), all_vars(!is.infinite(.)))
The is.finite
works on vector
and not on data.frame
object. So, we can loop through the data.frame
using lapply
and get only the 'finite' values.
lapply(df, function(x) x[is.finite(x)])
If the number of Inf
, -Inf
values are different for each column, the above code will have a list
with elements having unequal length
. So, it may be better to leave it as a list
. If we want a data.frame
, it should have equal lengths.
If we want to remove rows contain any NA or Inf/-Inf values
df[Reduce(`&`, lapply(df, function(x) !is.na(x) & is.finite(x))),]
Or a compact option by @nicola
df[Reduce(`&`, lapply(df, is.finite)),]
If we are ready to use a package, a compact option would be NaRV.omit
library(IDPmisc)
NaRV.omit(df)
set.seed(24)
df <- as.data.frame(matrix(sample(c(1:5, NA, -Inf, Inf),
20*5, replace=TRUE), ncol=5))