Subsetting Data based on a date range in R

天大地大妈咪最大 提交于 2019-12-23 05:51:08

问题


UPDATE

I've managed to load the data of the first 1000000 rows using the following code:

newFile <- read.table("course_4_proj_1.txt", header=TRUE, sep=";", na.strings = "?", nrows= 1000000, stringsAsFactors=TRUE)

This is what the head() returns, as an FYI

head(newFile)
        Date     Time Global_active_power Global_reactive_power Voltage Global_intensity
1 16/12/2006 17:24:00               4.216                 0.418  234.84             18.4
2 16/12/2006 17:25:00               5.360                 0.436  233.63             23.0
3 16/12/2006 17:26:00               5.374                 0.498  233.29             23.0
4 16/12/2006 17:27:00               5.388                 0.502  233.74             23.0
5 16/12/2006 17:28:00               3.666                 0.528  235.68             15.8
6 16/12/2006 17:29:00               3.520                 0.522  235.02             15.0
  Sub_metering_1 Sub_metering_2 Sub_metering_3
1              0              1             17
2              0              1             16
3              0              2             17
4              0              1             17
5              0              1             17
6              0              2             17

Now I need to subset because I only need to use the data from the dates 2007-02-01 and 2007-02-02. But I think I would need to convert the Date and Time variables to Date/Time classes in R using strptime() and as.Date() functions, but I'm not clear on how to do that. What is the simplest/cleanest way to do this?


回答1:


If size/memory is not an issue,

newFile <- read.table("course_4_proj_1.txt", header=TRUE, sep=";", na.strings = "?", nrows= 1000000, 
    stringsAsFactors=FALSE)
newFile$DateTime <- paste(newFile$Date, newFile$Time), 
newFile$DateTime <- as.Date(newFile$DateTime, format = "%d/%m/%Y %H:%M:%S")

If your computer is too weak and puny, but you can add packages, consider the data.table package

library(data.table)
newFile <- fread("course_4_proj_1.txt", na.strings = "?")

newFile[,DateTime := as.Date(paste(Date, Time), format = "%d/%m/%Y %H:%M:%S")]

and there are further optimizations one can use. I found answers here useful.

One can then subset the data.frame in the normal way. Here is a method using dplyr

library(dplyr)
subsetted <- filter(newFile, DateTime >= as.Date("2006-02-01 00:00:00"), DateTime < as.Date("2006-02-03 00:00:00"))



回答2:


The standard R read.table functions always read in the entire data set first. You might consider filtering the file some other way before reading into R, or using a package like sqldf which has a read.csv.sql function that can filter data on import. I haven't tested it with date classes yet.



来源:https://stackoverflow.com/questions/24006475/subsetting-data-based-on-a-date-range-in-r

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!