Indexing sequence chunks using data.table

好久不见. 提交于 2019-12-02 00:41:43

If I understand your question correctly, you want to set the fix_min to FALSE when R == 0 or when R == 1 & (1 =< Seq < 6 | Seq > 6). Then the following should give you what you want:

# recreating the data from your first code block
set.seed(1)
DT1 <- data.table(R=sample(0:1, 20000, rep=TRUE))[, smp:=.I
                                                  ][, Seq:=seq(.N), by=rleid(R)
                                                    ][, Seq2 := Seq[.N], by=rleid(R)]

# adding the needed 'fix_min' column
DT1[, fix_min := (R==1 & Seq[.N] > 1 & Seq%%6!=0), by=rleid(R)
    ][R==1 & Seq%%6==1 & Seq2%%6==1 & Seq==Seq2, fix_min := FALSE]

Explanation:

  • data.table(R=sample(0:1, 20000, rep=TRUE)) creates the base of the data.table
  • [, smp:=.I] creates an index and adds it to the data.table
  • by=rleid(R) identifies the sequences; to see what it does try: data.table(R=sample(0:1, 20000, rep=TRUE))[, seq.id:=rleid(R)]
  • [, Seq:=seq(.N), by=rleid(R)] creates an index for each sequence and adds it to the data.table; the sequences are identified by rleid(R)
  • [, Seq2 := Seq[.N], by=rleid(R)] creates a variable that contains a value indicating the length of the sequence
  • fix_min := (R==1 & Seq[.N] > 1 & Seq%%6!=0) creates a logical vector with TRUE values where R==1 & the length of the sequence is larger than one (Seq[.N] > 1) excluding the values where the sequence number is a multiple of 6 (Seq%%6!=0)
  • R==1 & Seq%%6==1 & Seq2%%6==1 & Seq==Seq2 filters the data.table as follows: only rows where R==1 & the sequence value is 7, 13, 19, etc (Seq%%6==1) & the length of the sequence is 7, 13, 19, etc and only selects the last row (Seq==Seq2) from the sequences that meet the other conditions. With fix_min := FALSE you set them to FALSE.
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!