问题
Suppose I have a dataset that has 100-odd columns and I need to keep only those rows in the data which meets one condition applied across all 100 columns.. How do I do this?
Suppose, its like below... I need to only keep rows where either of Col1 or 2 or 3 or 4 is >0
Col1 Col2 Col3 Col4
1 1 3 4
0 0 4 2
4 3 4 3
2 1 0 2
1 2 0 3
0 0 0 0
In above example, except last row all rows will make it .. I need to place results in same dataframe as original. not sure if I can use the lapply to loop through the columns where>0 or I can use subset.. Any help is appreciated
Can I use column indices and do df<-subset(df,c(2:100)>0)
. This doesn't give me the right result.
回答1:
Suppose your data.frame is DF
then using [
will do the work for you.
> DF[DF[,1]>0 | DF[,2] >0 | DF[,3] >0 | DF[,4] >0, ]
Col1 Col2 Col3 Col4
1 1 1 3 4
2 0 0 4 2
3 4 3 4 3
4 2 1 0 2
5 1 2 0 3
If you have hundreds of columns you can use this alternative approach
> DF[rowSums(DF)=!0, ]
Col1 Col2 Col3 Col4
1 1 1 3 4
2 0 0 4 2
3 4 3 4 3
4 2 1 0 2
5 1 2 0 3
回答2:
dat <- read.table(header = TRUE, text = "
Col1 Col2 Col3 Col4
1 1 3 4
0 0 4 2
4 3 4 3
2 1 0 2
1 2 0 3
0 0 0 0
")
You can use data.table to automatically accomodate however many columns your data.frame happens to have. Here's one way but there's probably a more elegant method of doing this with data.table:
require(data.table)
dt <- data.table(dat)
dt[rowSums(dt>0)>0]
# Col1 Col2 Col3 Col4
# 1: 1 1 3 4
# 2: 0 0 4 2
# 3: 4 3 4 3
# 4: 2 1 0 2
# 5: 1 2 0 3
来源:https://stackoverflow.com/questions/18589595/filter-rows-based-on-multiple-column-conditions-r