问题
I have a data set with a character column for months (MONTH
) and a numeric column indicating years (YEAR
). In order to work with it as panel data, I need to unite these YEAR
and MONTH
into a variable with a date format.
I have tried to change the variable MONTH
to numeric format and then to merge MONTH
with the column YEAR
. R would not recognize it as a date variable.
It currently looks like this:
STATE MONTH YEAR VALUE
California JAN 2018 800
California FEB 2018 780
California MAR 2018 600
... ... ... ...
Minesota JAN 2018 800
Minesota FEB 2018 780
Minesota MAR 2018 600
... ... ... ...
I want it like this:
STATE TIME VALUE
California 01-2018 800
California 02-2018 780
California 03-2018 600
... ... ...
Minesota 01-2018 800
Minesota 02-2018 780
Minesota 03-2018 600
... ... ...
回答1:
I would recommend handling this by going through bona-fide R dates, using as.Date
to generate an R date, then using format
to render the string output you want. Something like this:
df$TIME <- format(as.Date(paste0(df$MONTH, df$YEAR, "01"), format="%b%Y%d"), "%m-%Y")
I arbitrarily assign the first to each date in your data set, but this doesn't matter, because the call to format
only includes the month and year.
回答2:
In base R you could do something like:
transform(df,TIME = paste(sprintf('%02d',match(MONTH,toupper(month.abb))),YEAR,sep = '-'))[c(1,5,4)]
STATE TIME VALUE
1 California 01-2018 800
2 California 02-2018 780
3 California 03-2018 600
4 ... NA-... ...
5 Minesota 01-2018 800
6 Minesota 02-2018 780
7 Minesota 03-2018 600
回答3:
Combining Tim's response with an easy to use date package lubridate
we get:
# This can handle months of JAN, FEB, ETC. Or it can handle months of 01,02,etc.
df$TIME <- lubridate::ymd(paste0(df$YEAR,df$MONTH,"01"))
# or if you need it in MM-YYYY format:
df$TIME <- format(lubridate::ymd(paste0(df$YEAR,df$MONTH,"01")), "%m-%Y")
回答4:
You could simplify the below, but it makes it easier to see what's going on:
library(lubridate)
library(tidyverse)
df2 <- df %>%
mutate(TIME = parse_date_time(paste0(MONTH, YEAR), orders = "%b%Y"),
TIME = as.character(substr(TIME, 6, 7)),
TIME = paste0(TIME, "-", YEAR))
This is using lubridate
- the easiest way to parse dates in R IMO, dplyr
from tidyverse
and substr
from base R.
If you want to keep the date column then just pipe in another mutate
and call the new column something different.
回答5:
If you wish to use a full-on Tidyverse solution, consider this combination of tidyr
, and lubridate
's parse_date_time
:
library(tidyverse)
df <- tibble::tribble(
~STATE, ~MONTH, ~YEAR, ~VALUE,
"California", "JAN", 2018, 800,
"California", "FEB", 2018, 780,
"California", "MAR", 2018, 600,
"Minesota", "JAN", 2018, 800,
"Minesota", "FEB", 2018, 780,
"Minesota", "MAR", 2018, 600)
df %>%
tidyr::unite(TIME, c(MONTH, YEAR), sep = "-") %>%
dplyr::mutate(TIME = lubridate::parse_date_time(TIME, "my"))
#> # A tibble: 6 x 3
#> STATE TIME VALUE
#> <chr> <dttm> <dbl>
#> 1 California 2018-01-01 00:00:00 800
#> 2 California 2018-02-01 00:00:00 780
#> 3 California 2018-03-01 00:00:00 600
#> 4 Minesota 2018-01-01 00:00:00 800
#> 5 Minesota 2018-02-01 00:00:00 780
#> 6 Minesota 2018-03-01 00:00:00 600
Also check out the following related question: Converting year and month ("yyyy-mm" format) to a date?
来源:https://stackoverflow.com/questions/54355302/i-want-to-combine-two-variables-into-one-with-a-date-format