R: Separating out a mixed data column, date above multiple times

问题

I have a situation where I have a data.frame where a vector has the date above a sequence of times, and I'd like to convert into some kind of POSIX date-time field.

For example:

"7/16/2014", "5:06:59 PM", "11:51:26 AM", "7/13/2014", "3:53:16 PM", "3:24:19 PM", "11:47:49 AM", "7/12/2014", "11:57:41 AM", "7/11/2014", "10:01:48 AM", "7/10/2014", "4:54:08 PM", "2:23:04 PM", "11:34:09 AM"

Conceptually, it seems what to do is to replicate this MIXED vector into a DATEONLY vector and a TIMEONLY vector using regular expressions, so they maintain the same position, and then use something like fill function from tidyr to fill in the blank spots in the DATEONLY vector, then recombine the DATEONLY AND TIMEONLY columns... but I'm a bit stumped as to where to start.

I'd like to have it present as

"7/16/2014 5:06:59 PM", "7/16/2014 11:51:26 AM", "7/13/2014 3:53:16 PM" etc...

回答1:

I do not think this is a concise way to achieve your task. But, the following works. I could not come up with a good idea of splitting the vector (i.e., x). So I decided to work with a data frame. First, I created a group variable. In order to do that, as you mentioned in your question, I searched indices of date (month/day/year). Using the indices and na.locf(), I fill in the group column. Then, I split the data by group and handled pasting date and time with stri_join(). Finally, I unlist the list. If you want date objects, you need to work on that.

library(zoo)
library(magrittr)
library(stringi)

x <- c("7/16/2014", "5:06:59 PM", "11:51:26 AM",
       "7/13/2014", "3:53:16 PM", "3:24:19 PM", "11:47:49 AM",
       "7/12/2014", "11:57:41 AM", "7/11/2014", "10:01:48 AM",
       "7/10/2014", "4:54:08 PM", "2:23:04 PM", "11:34:09 AM")

# Create a data frame
mydf <- data.frame(date = x, group = NA)

# Get indices for date (month/day/year)
ind <- grep(pattern = "\\d+/\\d+/\\d+", x = mydf$date)

# Add group number to the ind positions of mydf$group and
# fill NA with the group numbers

mydf$group[ind] <- 1:length(ind)
mydf$group <- na.locf(mydf$group)

# Split the data frame by group and create dates (in character)
split(mydf, mydf$group) %>%
lapply(function(x){
          stri_join(x$date[1], x$date[2:length(x$date)], sep = " ")}) %>%
unlist


                     11                      12                      21                      22 
"7/16/2014 5:06:59 PM" "7/16/2014 11:51:26 AM"  "7/13/2014 3:53:16 PM"  "7/13/2014 3:24:19 PM" 
                     23                       3                       4                      51 
"7/13/2014 11:47:49 AM" "7/12/2014 11:57:41 AM" "7/11/2014 10:01:48 AM" "7/10/2014 4:54:08 PM" 
                     52                      53 
"7/10/2014 2:23:04 PM" "7/10/2014 11:34:09 AM"

来源：https://stackoverflow.com/questions/34169485/r-separating-out-a-mixed-data-column-date-above-multiple-times

标签

dplyr

tidyr