Convert age entered as 'X Weeks, Y Days, Z hours' in R

岁酱吖の 提交于 2020-01-04 05:59:52

问题


I have an age variable containing observations that follow this (inconsistent) format:

3 weeks, 2 days, 4 hours
4 weeks, 6 days, 12 hours
3 days, 18 hours
4 days, 3 hours
7 hours
8 hours

I need to convert each observation to hours using R.

I have used strsplit(vector, ',') to split the variable at each comma.

I am running trouble because splitting each observation at the ',' yields anywhere from 1 to 3 entries for each observation. I do not know how to properly index these entries so that I end up with one row for each observation.

I am guessing that once I am able to store these values in sensible rows, I can extract the numeric data from each column in a row and convert accordingly, then sum the entire row.

I am also open to any different methods of approaching this problem.


回答1:


After you split your data you can parse the resulting list for the keywords defining the times like 'hours', 'weeks', 'days' and create a dataframe containing the relevant value (or 0 if there is no value for a certain keyword). You can achieve that with something like this:

library(dplyr)
vector = c("3 weeks, 2 days, 4 hours", "4 weeks, 6 days, 12 hours", "3 days, 18 hours", "4 days, 3 hours", "7 hours", "8 hours")
split_vector = strsplit(vector, ",", fixed = TRUE)


parse_string = function(i){
  x = split_vector[[i]]
  data_frame(ID = i) %>% 
    mutate(hours = ifelse(any(grepl("hours", x)), as.numeric(gsub("\\D", "", x[grepl("hours", x)])), 0),
           days = ifelse(any(grepl("days", x)), as.numeric(gsub("\\D", "", x[grepl("days", x)])), 0),
           weeks = ifelse(any(grepl("weeks", x)), as.numeric(gsub("\\D", "", x[grepl("weeks", x)])), 0))
}

all_parsed = lapply(1:length(split_vector),  parse_string)
all_parsed = rbind_all(all_parsed) %>% 
  mutate(final_hours = hours + days * 24 + weeks * 7 * 24)



回答2:


Hadleyverse comes to the rescue again:

library(lubridate)
library(stringr)

dat <- readLines(textConnection(" 3   weeks,   2  days,  4 hours
 4 week,  6 days,  12 hours 
3 days, 18 hours
4 day, 3 hours
 7 hours
8  hour"))

sapply(str_split(str_trim(dat), ",[ ]*"), function(x) {
  sum(sapply(x, function(y) {
    bits <- str_split(str_trim(y), "[ ]+")[[1]]
    duration(as.numeric(bits[1]), bits[2])
  })) / 3600
})

## [1] 556 828  90  99   7   8

I whacked the data a bit to show it's also somewhat flexible in how it parses things. I rly don't think the second str_trim is absolutely necessary but didn't have cycles to verify.

The exposition is that it trims the original vector then splits it into components (which makes a list of vectors). That list is then iterated over and the individual vector elements are further trimmed and split into # and unit duration. That's passed to lubridate and the value is returned and automatically converted to numeric seconds by the call to sum and we then make it into hours.



来源:https://stackoverflow.com/questions/32744743/convert-age-entered-as-x-weeks-y-days-z-hours-in-r

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!