Read csv from specific row

前端 未结 3 1415
后悔当初
后悔当初 2020-12-04 18:27

I have daily data starting from 1980 in csv file. But I want to read data only from 1985. Because the other dataset in another file starts from 1985. How can I skip reading

3条回答
  •  萌比男神i
    2020-12-04 18:55

    Here are a few alternatives. (You may wish to convert the first column to "Date" class afterwards and possibly convert the entire thing to a zoo object or other time series class object.)

    # create test data
    fn <- tempfile()
    dd <- seq(as.Date("1980-01-01"), as.Date("1989-12-31"), by = "day")
    DF <- data.frame(Date = dd, Value = seq_along(dd))
    write.table(DF, file = fn, row.names = FALSE)
    

    read.table + subset

    # if file is small enough to fit in memory try this:
    
    DF2 <- read.table(fn, header = TRUE, as.is = TRUE)
    DF2 <- subset(DF2, Date >= "1985-01-01")
    

    read.zoo

    # or this which produces a zoo object and also automatically converts the 
    # Date column to Date class.  Note that all columns other than the Date column
    # should be numeric for it to be representable as a zoo object.
    library(zoo)
    z <- read.zoo(fn, header = TRUE)
    zw <- window(z, start = "1985-01-01")
    

    If your data is not in the same format as the example you will need to use additional arguments to read.zoo.

    multiple read.table's

    # if the data is very large read 1st row (DF.row1) and 1st column (DF.Date)
    # and use those to set col.names= and skip=
    
    DF.row1 <- read.table(fn, header = TRUE, nrow = 1)
    nc <- ncol(DF.row1)
    DF.Date <- read.table(fn, header = TRUE, as.is = TRUE, 
       colClasses = c(NA, rep("NULL", nc - 1)))
    n1985 <- which.max(DF.Date$Date >= "1985-01-01")
    
    DF3 <- read.table(fn, col.names = names(DF.row1), skip = n1985, as.is = TRUE)
    

    sqldf

    # this is probably the easiest if data set is large.
    
    library(sqldf)
    DF4 <- read.csv.sql(fn, sql = 'select * from file where Date >= "1985-01-01"')
    

提交回复
热议问题