Subset dataframe such that all values in each row are less than a certain value

Deadly 提交于 2020-01-28 03:01:11

问题


I have a dataframe with a dimension column and 4 value columns. How can I subset the column such that all 4 columns for each record are less than a given x? I know I could do this manually using subset and specifying the condition for each column, but is there a way to do it using maybe an apply function? Below is a sample dataframe. For example let's say the x was 0.7. In that case I would want to eliminate any rows where any column of that row is more than 0.7).

   zips ABC DEF GHI JKL
1     1 0.8 0.6 0.1 0.6
2     2 0.1 0.3 0.8 1.0
3     3 0.5 0.1 0.4 0.8
4     4 0.6 0.4 0.2 0.3
5     5 1.0 0.8 0.6 0.5
6     6 0.2 0.7 0.3 0.4
7     7 0.3 1.0 1.0 0.2
8     8 0.7 0.9 0.5 0.1
9     9 0.9 0.5 0.9 0.7
10   10 0.4 0.2 0.7 0.9

The following function seemed to work, but could someone explain the logic here?

Variance_Percentile[!rowSums(Variance_Percentile[-1] > 0.7), ]
  zips ABC DEF GHI JKL
4    4 0.6 0.4 0.2 0.3
6    6 0.2 0.7 0.3 0.4

回答1:


You can use the negated rowSums() for the subset

df[!rowSums(df[-1] > 0.7), ]
#   zips ABC DEF GHI JKL
# 4    4 0.6 0.4 0.2 0.3
# 6    6 0.2 0.7 0.3 0.4
  • df[-1] > 0.7 gives us a logical matrix telling us which df[-1] are greater than 0.7
  • rowSums() sums across those rows (each TRUE value is equal to 1, FALSE is zero)
  • ! converts those values to logical and negates them, so that we get any row sums which are zero (FALSE) and turn them into TRUE. In other words, if the rowSums() result is zero, we want those rows.
  • we use that logical vector for the row subset

Another way to get the same logical vector would be to do

rowSums(df[-1] > 0.7) == 0


来源:https://stackoverflow.com/questions/28180382/subset-dataframe-such-that-all-values-in-each-row-are-less-than-a-certain-value

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!