Adding a sequence

此生再无相见时 提交于 2020-07-16 09:29:06

问题


I have a dataframe that (simplified) looks something like this:

Index     Studio Event 
1
2          MovieStart
3
4
5
6
7           MovieEnd 
8
9
10          MovieStart
11
12
13
14
15          MovieEnd

I would like to create a third column that creates a sequence from 0 and increment of 50 that begins when the StudioEvent = MovieStart and ends when StudioEvent = MovieEnd. So something like this:

Index     Studio Event    Sequence
1
2          MovieStart      0
3                          50
4                          100
5                          150 
6                          200
7           MovieEnd       250
8
9
10          MovieStart     0
11                         50
12                         100
13                         150
14                         200
15          MovieEnd       250

Any idea how I can do it? Thank you in advance.


回答1:


Here is a base R option

inds <- Map(`:`,which(df$StudioEvent=="MovieStart"),which(df$StudioEvent=="MovieEnd"))
df$Sequence<-as.numeric(replace(df$StudioEvent,unlist(inds),(unlist(Map(seq_along,inds))-1)*50))

such that

> df
   Index StudioEvent Sequence
1      1        <NA>       NA
2      2  MovieStart        0
3      3        <NA>       50
4      4        <NA>      100
5      5        <NA>      150
6      6        <NA>      200
7      7    MovieEnd      250
8      8        <NA>       NA
9      9        <NA>       NA
10    10  MovieStart        0
11    11        <NA>       50
12    12        <NA>      100
13    13        <NA>      150
14    14        <NA>      200
15    15    MovieEnd      250

Data

> dput(df)
structure(list(Index = 1:15, StudioEvent = c(NA, "MovieStart", 
NA, NA, NA, NA, "MovieEnd", NA, NA, "MovieStart", NA, NA, NA,
NA, "MovieEnd")), row.names = c(NA, -15L), class = "data.frame")



回答2:


You could try this, it worked for me.

n<-data.frame(index= seq(1:15),              
   studio=c(NA,"MS",NA,NA,NA,NA,"ME",NA,NA,"MS",NA,NA,NA,NA,"ME"))

n$studio2<-0   #New column
n$studio2[n$studio=="MS"]<-1 ; n$studio2[n$studio=="ME"]<-2

n$seq<-0      #New column sequence
j<-1          #counter

k1<- which(n$studio=="MS")  #where movie starts
k2<- which(n$studio=="ME")  #where movie ends

Loop

  for(i in 1:length(n$studio2))
{
    if(n$studio2[i]==1)
   {  
      k<- k2[j]-k1[j]
      w<-seq(0,k*50,by=50)
      n$seq[k1[j]:k2[j]]<-w
      j<-j+1
    }
}



回答3:


An option using data.table:

#identify indices between MovieStart and MovieEnd
DT[, cs := cumsum(StudioEvent=="MovieStart") - cumsum(StudioEvent=="MovieEnd")]

#perform rolling join to find the start of movies for MovieEnd and indices between MovieStart and MovieEnd
DT[StudioEvent=="MovieEnd" | cs == 1L, 
    ms := DT[StudioEvent=="MovieStart"][.SD, on=.(Index), roll=Inf, x.Index]
]

#generate sequence
DT[, Sequence := (Index - ms) * 50]

output:

    Index StudioEvent cs ms Sequence
 1:     1              0 NA       NA
 2:     2  MovieStart  1  2        0
 3:     3              1  2       50
 4:     4              1  2      100
 5:     5              1  2      150
 6:     6              1  2      200
 7:     7    MovieEnd  0  2      250
 8:     8              0 NA       NA
 9:     9              0 NA       NA
10:    10  MovieStart  1 10        0
11:    11              1 10       50
12:    12              1 10      100
13:    13              1 10      150
14:    14              1 10      200
15:    15    MovieEnd  0 10      250

data:

library(data.table)
DT <- fread("Index,StudioEvent 
1,
2,MovieStart
3,
4,
5,
6,
7,MovieEnd 
8,
9,
10,MovieStart
11,
12,
13,
14,
15,MovieEnd")


来源:https://stackoverflow.com/questions/62088331/adding-a-sequence

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!