Read csv from specific row

前端未结

关注

 3  1415

后悔当初 2020-12-04 18:27

I have daily data starting from 1980 in csv file. But I want to read data only from 1985. Because the other dataset in another file starts from 1985. How can I skip reading

3条回答

萌比男神i (楼主)

2020-12-04 18:55

Here are a few alternatives. (You may wish to convert the first column to "Date" class afterwards and possibly convert the entire thing to a zoo object or other time series class object.)

# create test data
fn <- tempfile()
dd <- seq(as.Date("1980-01-01"), as.Date("1989-12-31"), by = "day")
DF <- data.frame(Date = dd, Value = seq_along(dd))
write.table(DF, file = fn, row.names = FALSE)

read.table + subset

# if file is small enough to fit in memory try this:

DF2 <- read.table(fn, header = TRUE, as.is = TRUE)
DF2 <- subset(DF2, Date >= "1985-01-01")

read.zoo

# or this which produces a zoo object and also automatically converts the 
# Date column to Date class.  Note that all columns other than the Date column
# should be numeric for it to be representable as a zoo object.
library(zoo)
z <- read.zoo(fn, header = TRUE)
zw <- window(z, start = "1985-01-01")

If your data is not in the same format as the example you will need to use additional arguments to read.zoo.

multiple read.table's

# if the data is very large read 1st row (DF.row1) and 1st column (DF.Date)
# and use those to set col.names= and skip=

DF.row1 <- read.table(fn, header = TRUE, nrow = 1)
nc <- ncol(DF.row1)
DF.Date <- read.table(fn, header = TRUE, as.is = TRUE, 
   colClasses = c(NA, rep("NULL", nc - 1)))
n1985 <- which.max(DF.Date$Date >= "1985-01-01")

DF3 <- read.table(fn, col.names = names(DF.row1), skip = n1985, as.is = TRUE)

sqldf

# this is probably the easiest if data set is large.

library(sqldf)
DF4 <- read.csv.sql(fn, sql = 'select * from file where Date >= "1985-01-01"')

0 讨论(0)

查看其它3个回答