Excel or R: Preparing time series from multiple sources?

前端 未结 2 1911
野性不改
野性不改 2020-12-31 20:33

Lately I often had to handle time series data from multiple .csv sources in the same analysis. Let\'s assume for simplicity that all series are regular quarterly series (no

2条回答
  •  孤独总比滥情好
    2020-12-31 21:37

    My strategy for problems of this type is:

    1. Read each data source into a standard data.frame
    2. Clean each data.frame, i.e. get data into desired format, process missing values, etc.
    3. Merge or join into standard data.frame
    4. Perform any aggregate data cleanup, e.g. adding blank lines, removing duplicates, etc.
    5. Only then pass data to the next step, such as conversion to ts object, plot it, etc.

    Using your example data:

    v1 <- "27.05.11;5965.95
    26.05.11;5947.06
    25.05.11;5942.82
    24.05.11;5939.98"
    
    v2 <- "Germany;Switzerland;USA;OECDEurope
    69,90974;61,8241;55,60966;64,96157
    67,0394;62,18966;56,47361;64,15152
    70,56651;63,6347;56,87237;65,43568"
    
    
    v3 <- "1984-04-01,33.3238396624473
    1984-07-01,63.579833082501
    1984-10-01,35.8375401560349"
    
    # Read and clean data
    dat1 <- read.table(textConnection(v1), header=FALSE, sep=";", dec=".")
    names(dat1) <- c("date", "V1")
    dat1$date <- as.Date(dat1$date, format="%d.%m.%y")
    dat1
    
    dat2 <- read.table(textConnection(v2), header=TRUE, sep=";", dec=",")
    dat2$date <- seq(as.Date("2011/1/1"), by="3 months", length.out=3)
    dat2
    
    dat3 <- read.table(textConnection(v3), header=FALSE, sep=",", dec=".")
    names(dat3) <- c("date", "V2")
    dat3$date <- as.Date(dat3$date)
    dat3
    
    # Merge separate data.frames.
    # I use join() in package plyr, you may wish to use merge(), rbind.fill, etc
    library(plyr)
    join(join(dat1, dat2, type="full"), dat3, type="full")
    

    The results:

             date      V1  Germany Switzerland      USA OECDEurope       V2
    1  2011-05-27 5965.95       NA          NA       NA         NA       NA
    2  2011-05-26 5947.06       NA          NA       NA         NA       NA
    3  2011-05-25 5942.82       NA          NA       NA         NA       NA
    4  2011-05-24 5939.98       NA          NA       NA         NA       NA
    5  2011-01-01      NA 69.90974    61.82410 55.60966   64.96157       NA
    6  2011-04-01      NA 67.03940    62.18966 56.47361   64.15152       NA
    7  2011-07-01      NA 70.56651    63.63470 56.87237   65.43568       NA
    8  1984-04-01      NA       NA          NA       NA         NA 33.32384
    9  1984-07-01      NA       NA          NA       NA         NA 63.57983
    10 1984-10-01      NA       NA          NA       NA         NA 35.83754
    

提交回复
热议问题