Removing specific rows from a dataframe

后端 未结 4 661
刺人心
刺人心 2020-11-29 03:54

I have a data frame e.g.:

sub   day
1      1
1      2
1      3
1      4
2      1
2      2
2      3
2      4
3      1
3      2
3      3
3      4
相关标签:
4条回答
  • 2020-11-29 03:57
    DF[ ! ( ( DF$sub ==1 & DF$day==2) | ( DF$sub ==3 & DF$day==4) ) , ]   # note the ! (negation)
    

    Or if sub is a factor as suggested by your use of quotes:

    DF[ ! paste(sub,day,sep="_") %in% c("1_2", "3_4"), ]
    

    Could also use subset:

    subset(DF,  ! paste(sub,day,sep="_") %in% c("1_2", "3_4") )
    

    (And I endorse the use of which in Dirk's answer when using "[" even though some claim it is not needed.)

    0 讨论(0)
  • 2020-11-29 03:58

    Here's a solution to your problem using dplyr's filter function.

    Although you can pass your data frame as the first argument to any dplyr function, I've used its %>% operator, which pipes your data frame to one or more dplyr functions (just filter in this case).

    Once you are somewhat familiar with dplyr, the cheat sheet is very handy.

    > print(df <- data.frame(sub=rep(1:3, each=4), day=1:4))
       sub day
    1    1   1
    2    1   2
    3    1   3
    4    1   4
    5    2   1
    6    2   2
    7    2   3
    8    2   4
    9    3   1
    10   3   2
    11   3   3
    12   3   4
    > print(df <- df %>% filter(!((sub==1 & day==2) | (sub==3 & day==4))))
       sub day
    1    1   1
    2    1   3
    3    1   4
    4    2   1
    5    2   2
    6    2   3
    7    2   4
    8    3   1
    9    3   2
    10   3   3
    
    0 讨论(0)
  • 2020-11-29 04:08

    This boils down to two distinct steps:

    1. Figure out when your condition is true, and hence compute a vector of booleans, or, as I prefer, their indices by wrapping it into which()
    2. Create an updated data.frame by excluding the indices from the previous step.

    Here is an example:

    R> set.seed(42)
    R> DF <- data.frame(sub=rep(1:4, each=4), day=sample(1:4, 16, replace=TRUE))
    R> DF
       sub day
    1    1   4
    2    1   4
    3    1   2
    4    1   4
    5    2   3
    6    2   3
    7    2   3
    8    2   1
    9    3   3
    10   3   3
    11   3   2
    12   3   3
    13   4   4
    14   4   2
    15   4   2
    16   4   4
    R> ind <- which(with( DF, sub==2 & day==3 ))
    R> ind
    [1] 5 6 7
    R> DF <- DF[ -ind, ]
    R> table(DF)
       day
    sub 1 2 3 4
      1 0 1 0 3
      2 1 0 0 0
      3 0 1 3 0
      4 0 2 0 2
    R> 
    

    And we see that sub==2 has only one entry remaining with day==1.

    Edit The compound condition can be done with an 'or' as follows:

    ind <- which(with( DF, (sub==1 & day==2) | (sub=3 & day=4) ))
    

    and here is a new full example

    R> set.seed(1)
    R> DF <- data.frame(sub=rep(1:4, each=5), day=sample(1:4, 20, replace=TRUE))
    R> table(DF)
       day
    sub 1 2 3 4
      1 1 2 1 1
      2 1 0 2 2
      3 2 1 1 1
      4 0 2 1 2
    R> ind <- which(with( DF, (sub==1 & day==2) | (sub==3 & day==4) ))
    R> ind
    [1]  1  2 15
    R> DF <- DF[-ind, ]
    R> table(DF)
       day
    sub 1 2 3 4
      1 1 0 1 1
      2 1 0 2 2
      3 2 1 1 0
      4 0 2 1 2
    R> 
    
    0 讨论(0)
  • 2020-11-29 04:13

    One simple solution:

    cond1 <- df$sub == 1 & df$day == 2

    cond2 <- df$sub == 3 & df$day == 4

    df <- df[!(cond1 | cond2),]

    0 讨论(0)
提交回复
热议问题