Transforming a dataframe into a TS in R [duplicate]

允我心安 提交于 2020-03-15 09:32:29

问题


I've been trying to transform a dataframe I put together into a Time Series, but for some reason it doesn't work. I am very new to R.

    x<-Sales_AEMBG%>%
+   select(Ecriture.DatEcr, Crédit, Mapping)
> names(x)<-c("Dates","Revenue","Mapping")
> str(x)
Classes ‘tbl_df’, ‘tbl’ and 'data.frame':   15167 obs. of  3 variables:
 $ Dates  : POSIXct, format: "2016-01-02" "2016-01-02" "2016-01-02" "2016-01-02" ...
 $ Revenue: num  124065 214631 135810 225293 57804 ...
 $ Mapping: chr  "E.M 1.5 L" "E.M 1.5 L" "E.M 1.5 L" "E.M 1.5 L" ...'

When I try to look at the data, here's what I have

> head(x)
# A tibble: 6 x 3
  Dates               Revenue Mapping  
  <dttm>                <dbl> <chr>    
1 2016-01-02 00:00:00 124065. E.M 1.5 L
2 2016-01-02 00:00:00 214631. E.M 1.5 L
3 2016-01-02 00:00:00 135810. E.M 1.5 L
4 2016-01-02 00:00:00 225293. E.M 1.5 L
5 2016-01-02 00:00:00  57804. E.M 1.5 L
6 2016-01-02 00:00:00 124065. E.M 1.5 L

Of course, I tried the as.ts function

 > x_xts <- as.ts(x)
Warning message:
In data.matrix(data) : NAs introduced by coercion
> is.ts(x)
[1] FALSE

But it keeps telling me that my dataframe is still not recognized as a TS.

What do you suggest ?

Thanks


回答1:


I've added a few more observations to your data.

# A tibble: 12 x 3
   Dates               Revenue Mapping  
   <dttm>                <dbl> <chr>    
 1 2016-01-02 00:00:00  124065 E.M 1.5 L
 2 2016-01-02 00:00:00  214631 E.M 1.5 L
 3 2016-01-03 00:00:00  135810 E.M 1.5 L
 4 2016-01-03 00:00:00  225293 E.M 1.5 L
 5 2016-01-05 00:00:00   57804 E.M 1.5 L
 6 2016-01-05 00:00:00  124065 E.M 1.5 L
 7 2016-01-02 00:00:00   24065 E.M 1.5 M
 8 2016-01-02 00:00:00   14631 E.M 1.5 M
 9 2016-01-03 00:00:00   35810 E.M 1.5 M
10 2016-01-03 00:00:00   25293 E.M 1.5 M
11 2016-01-05 00:00:00    7804 E.M 1.5 M
12 2016-01-05 00:00:00   24065 E.M 1.5 M

First you need to sum the sales by day (Dates) and product type (your Mapping variable?), and pivot into a wider data format:

library(dplyr)
library(tidyr)

x.sum <- x %>%
  group_by(Mapping, Dates) %>%
  summarise(Revenue=sum(Revenue)) %>%
  pivot_wider(id_cols=Dates, names_from=Mapping, values_from=Revenue)

# A tibble: 3 x 3
  Dates               `E,M 1.5 L` `E,M 1.5 M`
  <dttm>                    <dbl>       <dbl>
1 2016-01-02 00:00:00      338696       38696
2 2016-01-03 00:00:00      361103       61103
3 2016-01-05 00:00:00      181869       31869

Note that I've deliberately omitted Jan 4.

If your time series data has missing days, such as stock prices where financial markets are closed on the weekends, then using the as.ts (or ts) function won't work. If there are no missing days, then then correct way to convert the data into a time series object ("ts") is to specify the column(s) to convert (x.sum[,2:3]) and the start (January 2, 2016) and frequency (daily) of the series.

x.ts <- ts(x.sum[,2:3], start=c(2016, 2), frequency=365)

Be careful with the start as the second argument depends on the specified frequency. Here, 365 means daily, so the "2" means day 2 of year 2016. If the frequency was monthly, the "2" would mean month 2 of year 2016.

But as I mentioned, ts doesn't ignore any missing days. So for this make-up data, if you plotted the time series, then you will get the wrong information.

In this case, other packages such as xts and zoo can be used to simply the work.

library(xts)
x.xts <- xts(x.sum[,2:3], order.by=x.sum$Dates)

plot(x.xts) # Correct results.

Other answers about time series can be found here and here.



来源:https://stackoverflow.com/questions/60483201/transforming-a-dataframe-into-a-ts-in-r

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!