Subset dataframe such that all values in each row are less than a certain value

问题

I have a dataframe with a dimension column and 4 value columns. How can I subset the column such that all 4 columns for each record are less than a given x? I know I could do this manually using subset and specifying the condition for each column, but is there a way to do it using maybe an apply function? Below is a sample dataframe. For example let's say the x was 0.7. In that case I would want to eliminate any rows where any column of that row is more than 0.7).

   zips ABC DEF GHI JKL
1     1 0.8 0.6 0.1 0.6
2     2 0.1 0.3 0.8 1.0
3     3 0.5 0.1 0.4 0.8
4     4 0.6 0.4 0.2 0.3
5     5 1.0 0.8 0.6 0.5
6     6 0.2 0.7 0.3 0.4
7     7 0.3 1.0 1.0 0.2
8     8 0.7 0.9 0.5 0.1
9     9 0.9 0.5 0.9 0.7
10   10 0.4 0.2 0.7 0.9

The following function seemed to work, but could someone explain the logic here?

Variance_Percentile[!rowSums(Variance_Percentile[-1] > 0.7), ]
  zips ABC DEF GHI JKL
4    4 0.6 0.4 0.2 0.3
6    6 0.2 0.7 0.3 0.4

回答1:

You can use the negated rowSums() for the subset

df[!rowSums(df[-1] > 0.7), ]
#   zips ABC DEF GHI JKL
# 4    4 0.6 0.4 0.2 0.3
# 6    6 0.2 0.7 0.3 0.4

df[-1] > 0.7 gives us a logical matrix telling us which df[-1] are greater than 0.7
rowSums() sums across those rows (each TRUE value is equal to 1, FALSE is zero)
! converts those values to logical and negates them, so that we get any row sums which are zero (FALSE) and turn them into TRUE. In other words, if the rowSums() result is zero, we want those rows.
we use that logical vector for the row subset

Another way to get the same logical vector would be to do

rowSums(df[-1] > 0.7) == 0

来源：https://stackoverflow.com/questions/28180382/subset-dataframe-such-that-all-values-in-each-row-are-less-than-a-certain-value

标签

dataframe

apply