How to find the indices where there are n consecutive zeroes in a row

前端 未结 4 1102
逝去的感伤
逝去的感伤 2020-12-19 02:02

Suppose I have this data:

  x = c(14,14, 6,  7 ,14 , 0 ,0  ,0 , 0,  0,  0 , 0 , 0,  0 , 0 , 0 , 0,  9  ,1 , 3  ,8  ,9 ,15,  9 , 8, 13,  8,  4 , 6 , 7 ,10 ,13         


        
相关标签:
4条回答
  • 2020-12-19 02:31

    If x happens to be a column of a data.table you can do

    library(data.table)
    dt <- data.table(x = x)
    
    dt[, if(.N > 3 & all(x == 0)) .(starts = first(.I), ends = last(.I))
       , by = rleid(x)]
    
    #    rleid starts ends
    # 1:     5      6   17
    # 2:    22     34   58
    # 3:    34     72   89
    

    Explanation:

    • rleid(x) gives an ID (integer) for each element in x indicating which "run" the element is a member of, where "run" means a sequence of adjacent equal values.

    • dt[, <code>, by = rle(x)] partitions dt according to rleid(x) and computes <code> for each subset of dt's rows. The results are stacked together in a single data.table.

    • .N is the number of elements in the given subset

    • .I is the vector of row numbers corresponding to the subset

    • first and last give the first and last element of a vector

    • .(<stuff>) is the same as list(<stuff>)

      The rleid function, by grouping within the brackets, .N and .I symbols, first and last functions are part of the data.table package.

    0 讨论(0)
  • 2020-12-19 02:40

    By using dplyr , get the diff then if the diff not equal to 0 , they are not belong to same group , after cumsum we get the grouid

    library(dplyr)
    df=data.frame('x'=x,rownumber=seq(length(x)))
    df$Groupid=cumsum(c(0,diff(df$x==0))!=0)
    df%>%group_by(Groupid)%>%summarize(start=first(rownumber),end=last(rownumber),number=first(x),size=n())%>%filter(number==0&size>=3)
    # A tibble: 3 x 5
      Groupid start   end number  size
        <int> <int> <int>  <dbl> <int>
    1       1     6    17      0    12
    2       3    34    58      0    25
    3       5    72    89      0    18
    
    0 讨论(0)
  • 2020-12-19 02:50
    Starts = which(diff(x == 0) == 1) + 1
    Ends   = which(diff(x == 0) == -1)
    if(length(Ends) < length(Starts)) {
        Ends = c(Ends, length(x)) }
    
    Starts
    [1]  6 34 72
    Ends
    [1] 17 58 89
    

    This works for your test data, but allows any sequence of zeros, including short ones. To insure that you get sequences of length at least n, add:

    n=3
    Long = which((Ends - Starts) >= n)
    Starts = Starts[Long]
    Ends = Ends[Long]
    
    0 讨论(0)
  • 2020-12-19 02:52

    Here are two base R approaches:

    1) rle First run rle and then compute ok to pick out the sequences of zeros that are more than 3 long. We then compute the starts and ends of all repeated sequences subsetting to the ok ones at the end.

    with(rle(x), {
      ok <- values == 0 & lengths > 3
      ends <- cumsum(lengths)
      starts <- ends - lengths + 1
      data.frame(starts, ends)[ok, ]
    })
    

    giving:

      starts ends
    1      6   17
    2     34   58
    3     72   89
    

    2) gregexpr Take the sign of each number -- that will be 0 or 1 and then concatenate those into a long string. Then use gregexpr to find the location of at least 4 zeros. The result gives the starts and the ends can be computed from that plus the match.length attribute minus 1.

    s <- paste(sign(x), collapse = "")
    g <- gregexpr("0{4,}", s)[[1]]
    data.frame(starts = 0, ends = attr(g, "match.length") - 1) + g
    

    giving:

      starts ends
    1      6   17
    2     34   58
    3     72   89
    
    0 讨论(0)
提交回复
热议问题