How to calculate time difference in consecutive rows

左心房为你撑大大i 提交于 2020-02-04 04:10:36

问题


Raw data looks like this and I want to sort it by visitor and time, to calculate the time difference in the rows, before saving it to a new file.

  visitor         v_time payment items
1    Jack 1/2/2018 16:07      35     3
2    Jack 1/2/2018 16:09     160     1
3   David 1/2/2018 16:12      25     2
4    Kate 1/2/2018 16:16       3     3
5   David 1/2/2018 16:21      25     5
6    Jack 1/2/2018 16:32      85     5
7    Kate 1/2/2018 16:33     639     3
8    Jack 1/2/2018 16:55       6     2

The grouping and sorting are ok. But it failed to calculate the time difference, nor the file saving.

visitor <- c("Jack", "Jack", "David", "Kate", "David", "Jack", "Kate", "Jack")
v_time <- c("1/2/2018 16:07","1/2/2018 16:09","1/2/2018 16:12","1/2/2018 16:16","1/2/2018 16:21","1/2/2018 16:32","1/2/2018 16:33", "1/2/2018 16:55")
payment <- c(35,160,25,3,25,85,639,6)
items <- c(3,1,2,3,5,5,3,2)
df <- data.frame(visitor, v_time, payment, items)

df %>%
  arrange(visitor, v_time) %>%
  group_by(visitor) %>%
  mutate(diff = strptime(v_time, "%d/%m/%Y %H:%M") - lag(strptime(v_time, "%d/%m/%Y %H:%M")), diff_secs = as.numeric(diff, units = 'secs'))

write.csv(df,"C:/output.csv", row.names = F)

What is my error and the right way of doing it?

# A tibble: 8 x 6
# Groups: visitor [3]
  visitor v_time         payment items diff   diff_secs
  <fct>   <fct>            <dbl> <dbl> <time>     <dbl>
1 David   1/2/2018 16:12   25.0   2.00 NA            NA
2 David   1/2/2018 16:21   25.0   5.00 NA            NA
3 Jack    1/2/2018 16:07   35.0   3.00 NA            NA
4 Jack    1/2/2018 16:09  160     1.00 NA            NA
5 Jack    1/2/2018 16:32   85.0   5.00 NA            NA
6 Jack    1/2/2018 16:55    6.00  2.00 NA            NA
7 Kate    1/2/2018 16:16    3.00  3.00 NA            NA
8 Kate    1/2/2018 16:33  639     3.00 NA            NA

回答1:


When you just add default = strptime(v_time, "%d/%m/%Y %H:%M")[1] to the lag part:

df <- df %>%
  arrange(visitor, v_time) %>%
  group_by(visitor) %>%
  mutate(diff = strptime(v_time, "%d/%m/%Y %H:%M") - lag(strptime(v_time, "%d/%m/%Y %H:%M"), default = strptime(v_time, "%d/%m/%Y %H:%M")[1]),
         diff_secs = as.numeric(diff, units = 'secs'))

you get the result you expect:

> df
# A tibble: 8 x 6
# Groups:   visitor [3]
  visitor v_time         payment items diff   diff_secs
  <fct>   <fct>            <dbl> <dbl> <time>     <dbl>
1 David   1/2/2018 16:12     25.    2. 0             0.
2 David   1/2/2018 16:21     25.    5. 540         540.
3 Jack    1/2/2018 16:07     35.    3. 0             0.
4 Jack    1/2/2018 16:09    160.    1. 120         120.
5 Jack    1/2/2018 16:32     85.    5. 1380       1380.
6 Jack    1/2/2018 16:55      6.    2. 1380       1380.
7 Kate    1/2/2018 16:16      3.    3. 0             0.
8 Kate    1/2/2018 16:33    639.    3. 1020       1020.

Another option is to use difftime:

df <- df %>%
  arrange(visitor, v_time) %>%
  group_by(visitor) %>%
  mutate(diff = difftime(strptime(v_time, "%d/%m/%Y %H:%M"), lag(strptime(v_time, "%d/%m/%Y %H:%M"), default = strptime(v_time, "%d/%m/%Y %H:%M")[1]), units = 'mins'),
         diff_secs = as.numeric(diff, units = 'secs'))

now the diff-column is in minutes and the diff_sec-column is in seconds:

> df
# A tibble: 8 x 6
# Groups:   visitor [3]
  visitor v_time         payment items diff   diff_secs
  <fct>   <fct>            <dbl> <dbl> <time>     <dbl>
1 David   1/2/2018 16:12     25.    2. 0             0.
2 David   1/2/2018 16:21     25.    5. 9           540.
3 Jack    1/2/2018 16:07     35.    3. 0             0.
4 Jack    1/2/2018 16:09    160.    1. 2           120.
5 Jack    1/2/2018 16:32     85.    5. 23         1380.
6 Jack    1/2/2018 16:55      6.    2. 23         1380.
7 Kate    1/2/2018 16:16      3.    3. 0             0.
8 Kate    1/2/2018 16:33    639.    3. 17         1020.

You can now save the result again with write.csv(df,"C:/output.csv", row.names = FALSE)




回答2:


The error comes from lag(strptime(v_time, "%d/%m/%Y %H:%M"))

Error message:

# Error in format.POSIXlt(x, usetz = TRUE) : 
#  invalid component [[10]] in "POSIXlt" should be 'zone'

To avoid this, try strptime(lag(v_time), "%d/%m/%Y %H:%M")

df <- df %>%
    arrange(visitor, v_time) %>%
    group_by(visitor) %>%
    mutate(diff = strptime(v_time, "%d/%m/%Y %H:%M") - strptime(lag(v_time), "%d/%m/%Y %H:%M"), diff_secs = as.numeric(diff, units = 'secs'))
print(df)

Output:

# A tibble: 8 x 6
# Groups:   visitor [3]
  visitor         v_time payment items    diff diff_secs
   <fctr>         <fctr>   <dbl> <dbl>  <time>     <dbl>
1   David 1/2/2018 16:12      25     2 NA mins        NA
2   David 1/2/2018 16:21      25     5  9 mins       540
3    Jack 1/2/2018 16:07      35     3 NA mins        NA
4    Jack 1/2/2018 16:09     160     1  2 mins       120
5    Jack 1/2/2018 16:32      85     5 23 mins      1380
6    Jack 1/2/2018 16:55       6     2 23 mins      1380
7    Kate 1/2/2018 16:16       3     3 NA mins        NA
8    Kate 1/2/2018 16:33     639     3 17 mins      1020

Don't forget to save your work on df using df <- before you export it.




回答3:


Here's an approach with the lubridate package

library(lubridate)
df$v_time <- mdy_hm(df$v_time)
df <- df %>%
  arrange(visitor, v_time) %>%
  group_by(visitor) 
df$diff <- rep(0,nrow(df))
for(i in 1:(nrow(df)-1)){
  df$diff[i+1] <- df$v_time[i+1]-df$v_time[i]
}
write.csv(df,"C:/output.csv", row.names = F)



回答4:


Here is an option with difftime. We convert the 'v_time' to datetime with dmy_hm (from lubridate), then after arrangeing, and grouping by 'visitor', use difftime to the output in seconds

library(tidyverse)
out <- df %>% 
        mutate(v_time = dmy_hm(v_time)) %>% 
        arrange(visitor, v_time) %>% 
        group_by(visitor) %>%
        mutate(diff = difftime(v_time, lag(v_time, default = first(v_time)), units = "secs"))
# A tibble: 8 x 5
# Groups: visitor [3]
#  visitor v_time              payment items diff  
#  <fctr>  <dttm>                <dbl> <dbl> <time>
#1 David   2018-02-01 16:12:00   25.0   2.00 0     
#2 David   2018-02-01 16:21:00   25.0   5.00 540   
#3 Jack    2018-02-01 16:07:00   35.0   3.00 0     
#4 Jack    2018-02-01 16:09:00  160     1.00 120   
#5 Jack    2018-02-01 16:32:00   85.0   5.00 1380  
#6 Jack    2018-02-01 16:55:00    6.00  2.00 1380  
#7 Kate    2018-02-01 16:16:00    3.00  3.00 0     
#8 Kate    2018-02-01 16:33:00  639     3.00 1020  

Then, we write to csv with write_csv

write_csv(out, "yourfile.csv")


来源:https://stackoverflow.com/questions/49003378/how-to-calculate-time-difference-in-consecutive-rows

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!