using R - delete rows when a value repeated less than 3 times

前端 未结 4 993
没有蜡笔的小新
没有蜡笔的小新 2020-12-16 22:10

frame with 10 rows and 3 columns

    a   b c
1   1 201 1
2   2 202 1
3   3 203 1
4   4 204 1
5   5 205 4
6   6 206 5
7   7 207 4
8   8 208 4
9   9 209 8
10 1         


        
相关标签:
4条回答
  • 2020-12-16 22:43

    Here is a solution using ave :

    Data[ave(Data$c, Data$c, FUN = length) > 2, ]
    

    or using ave with subset:

    subset(Data, ave(c, c, FUN = length) > 2)
    
    0 讨论(0)
  • 2020-12-16 22:53

    Correct me if I'm wrong, but it seems like you want all the rows where the value in column c occurs more than twice. "Repeated" makes me think that they need to occur consecutively, which is what rle is for, but you would only want rows 1-4 if that was what you were trying to do.

    That said, the code below finds the rows where the value in column c occurs more than 2 times. I'm sure this can be done more elegantly, but it works.

    lines <-
    "a   b c
    1 201 1
    2 202 1
    3 203 1
    4 204 1
    5 205 4
    6 206 5
    7 207 4
    8 208 4
    9 209 8
    10 210 5"
    Data <- read.table(con <- textConnection(lines), header=TRUE); close(con)
    cVals <- data.frame(table(Data$c))
    Rows <- Data$c %in% cVals[cVals$Freq > 2,1]
    Data[Rows,]
    #  a   b c
    #1 1 201 1
    #2 2 202 1
    #3 3 203 1
    #4 4 204 1
    #5 5 205 4
    #7 7 207 4
    #8 8 208 4
    
    0 讨论(0)
  • 2020-12-16 22:59

    Using unsplit is probably the easiest way to project a grouped aggregate (in this case using table to get counts, but see tapply for the general case) out to the original data.

    subset(Data, with(Data, unsplit(table(c), c)) >= 3)
    

    Equivalently and more similar to Erik's:

    Data[unsplit(table(Data$c), Data$c) >= 3, ]
    
    0 讨论(0)
  • 2020-12-16 23:04

    Building on Joshua's answer:

    Data[Data$c %in% names(which(table(Data$c) > 2)), ]
    
    0 讨论(0)
提交回复
热议问题