问题
I am trying to remove rows based on whether or not columns 2 and 3 contain 0's. I keep getting very strange results. I tried to write it without subset
initially because I read somewhere that subset
should only be used for small amounts of data because of the memory cost. Neither attempt worked for me however. Can someone explain what I did wrong?
df <- data.frame(val1=c(1,2,3), val2=c(4,0,5), val3=c(3,0,6))
subset(df,df>0,c(2,3))
data.frame(df[df[,c(2,3)]!=0])
starting dataframe:
val1 val2 val3
1 1 4 3
1 2 0 0
3 3 5 6
end goal:
val1 val2 val3
1 1 4 3
3 3 5 6
回答1:
Using the subset
, we create a logical index based on the 2nd and third columns.
subset(df, subset=!(val2==0|val3==0))
as subset
argument works on columns and not on matrices.
We can also use [
instead of subset
.
df[!(df[,2]==0|df[,3]==0),]
Regarding the second answer in the OP's post
df[,c(2,3)]!=0 #returns a matrix
# val2 val3
#[1,] TRUE TRUE
#[2,] FALSE FALSE
#[3,] TRUE TRUE
For subsetting rows, we need only a single logical index per each row.
Another option is rowSums
(if you want to remove rows that are 0 for both column 2 and 3)
df[rowSums(df[2:3])!=0,]
i.e.
df$val3[2] <- 2
will return all the rows with rowSums
while the other methods return rows 1 and 3.
The equivalent option with subset
is &
subset(df, !(val2==0 & val3==0))
来源:https://stackoverflow.com/questions/32851588/getting-subset-of-of-data-based-on-multiple-column-values