How to remove rows with inf from a dataframe in R

后端未结

关注

 6  610

I have a very large dataframe(df) with approximately 35-45 columns(variables) and rows greater than 300. Some of the rows contains NA,NaN,Inf,-Inf values in

相关标签:

6条回答

渐次进展

2020-12-29 04:09
To remove the rows with +/-Inf I'd suggest the following:
```
df <- df[!is.infinite(rowSums(df)),]
```
or, equivalently,
```
df <- df[is.finite(rowSums(df)),]
```
The second option (the one with is.finite() and without the negation) removes also rows containing NA values in case that this has not already been done.
0 讨论(0)
发布评论:

提交评论
- 加载中...
温柔的废话

2020-12-29 04:10
I had this problem and none of the above solutions worked for me. I used the following to remove rows with +/-Inf in columns 15 and 16 of my dataframe.
```
d<-subset(c, c[,15:16]!="-Inf") 
e<-subset(d, d[,15:16]!="Inf")
```
0 讨论(0)
发布评论:

提交评论
- 加载中...

情书的邮戳

2020-12-29 04:13

It took me awhile to work this out for dplyr 1.0.0 so i thought i would put up the new version of @sbha solutions using c_across since filter_all, filter_if are getting deprecated.

library(dplyr)
df <- tibble(a = c(1, 2, 3, NA), b = c(5, Inf, 8, 8), c = c(9, 10, Inf, 11), d = c('a', 'b', 'c', 'd'))
#       a     b     c d    
#   <dbl> <dbl> <dbl> <chr>
# 1     1     5     9 a    
# 2     2   Inf    10 b    
# 3     3     8   Inf c    
# 4    NA     8    11 d 

df %>% 
  rowwise %>% 
  filter(!all(is.infinite(c_across(where(is.numeric)))))
# # A tibble: 4 x 4
# # Rowwise: 
#       a     b     c d    
#   <dbl> <dbl> <dbl> <chr>
# 1     1     5     9 a    
# 2     2   Inf    10 b    
# 3     3     8   Inf c    
# 4    NA     8    11 d 

df %>% 
  rowwise %>% 
  filter(!any(is.infinite(c_across(where(is.numeric)))))
# # A tibble: 2 x 4
# # Rowwise: 
#       a     b     c d    
#   <dbl> <dbl> <dbl> <chr>
# 1     1     5     9 a    
# 2    NA     8    11 d 

df %>% 
  rowwise %>% 
  filter(!any(is.infinite(c_across(a:c))))

# # A tibble: 2 x 4
# # Rowwise: 
#       a     b     c d    
#   <dbl> <dbl> <dbl> <chr>
# 1     1     5     9 a    
# 2    NA     8    11 d

To be honest I think @sbha answer is simpler!

0 讨论(0)

刺人心

2020-12-29 04:28

To keep the rows without Inf we can do:

df[apply(df, 1, function(x) all(is.finite(x))), ]

Also NAs are handled by this because of:
a rowindex with value NA will remove this row in the result.

Also rows with NaN are not in the result.

set.seed(24)
df <- as.data.frame(matrix(sample(c(0:9, NA, -Inf, Inf, NaN),  20*5, replace=TRUE), ncol=5))
df2 <- df[apply(df, 1, function(x) all(is.finite(x))), ]

Here are the results of the different is.~-functions:

x <- c(42, NA, NaN, Inf)
is.finite(x)
# [1]  TRUE FALSE FALSE FALSE
is.na(x)
# [1] FALSE  TRUE  TRUE FALSE
is.nan(x)
# [1] FALSE FALSE  TRUE FALSE

0 讨论(0)

孤街浪徒

2020-12-29 04:35

Depending on the data, there are a couple options using scoped variants of dplyr::filter() and is.finite() or is.infinite() that might be useful:

library(dplyr)

# sample data
df <- data_frame(a = c(1, 2, 3, NA), b = c(5, Inf, 8, 8), c = c(9, 10, Inf, 11), d = c('a', 'b', 'c', 'd'))

# across all columns:
df %>% 
  filter_all(all_vars(!is.infinite(.)))

# note that is.finite() does not work with NA or strings:
df %>% 
  filter_all(all_vars(is.finite(.)))

# checking only numeric columns:
df %>% 
  filter_if(~is.numeric(.), all_vars(!is.infinite(.)))

# checking only select columns, in this case a through c:
df %>% 
  filter_at(vars(a:c), all_vars(!is.infinite(.)))

0 讨论(0)

深忆病人

2020-12-29 04:35
The is.finite works on vector and not on data.frame object. So, we can loop through the data.frame using lapply and get only the 'finite' values.
```
lapply(df, function(x) x[is.finite(x)])
```
If the number of Inf, -Inf values are different for each column, the above code will have a list with elements having unequal length. So, it may be better to leave it as a list. If we want a data.frame, it should have equal lengths.

If we want to remove rows contain any NA or Inf/-Inf values
```
df[Reduce(`&`, lapply(df, function(x) !is.na(x)  & is.finite(x))),]
```
Or a compact option by @nicola
```
df[Reduce(`&`, lapply(df, is.finite)),]
```
If we are ready to use a package, a compact option would be NaRV.omit
```
library(IDPmisc)
NaRV.omit(df)
```
data
```
set.seed(24)
df <- as.data.frame(matrix(sample(c(1:5, NA, -Inf, Inf), 
                      20*5, replace=TRUE), ncol=5))
```
0 讨论(0)
发布评论:

提交评论
- 加载中...