问题
UPDATE
I've managed to load the data of the first 1000000 rows using the following code:
newFile <- read.table("course_4_proj_1.txt", header=TRUE, sep=";", na.strings = "?", nrows= 1000000, stringsAsFactors=TRUE)
This is what the head()
returns, as an FYI
head(newFile)
Date Time Global_active_power Global_reactive_power Voltage Global_intensity
1 16/12/2006 17:24:00 4.216 0.418 234.84 18.4
2 16/12/2006 17:25:00 5.360 0.436 233.63 23.0
3 16/12/2006 17:26:00 5.374 0.498 233.29 23.0
4 16/12/2006 17:27:00 5.388 0.502 233.74 23.0
5 16/12/2006 17:28:00 3.666 0.528 235.68 15.8
6 16/12/2006 17:29:00 3.520 0.522 235.02 15.0
Sub_metering_1 Sub_metering_2 Sub_metering_3
1 0 1 17
2 0 1 16
3 0 2 17
4 0 1 17
5 0 1 17
6 0 2 17
Now I need to subset because I only need to use the data from the dates 2007-02-01 and 2007-02-02. But I think I would need to convert the Date and Time variables to Date/Time classes in R using strptime()
and as.Date()
functions, but I'm not clear on how to do that. What is the simplest/cleanest way to do this?
回答1:
If size/memory is not an issue,
newFile <- read.table("course_4_proj_1.txt", header=TRUE, sep=";", na.strings = "?", nrows= 1000000,
stringsAsFactors=FALSE)
newFile$DateTime <- paste(newFile$Date, newFile$Time),
newFile$DateTime <- as.Date(newFile$DateTime, format = "%d/%m/%Y %H:%M:%S")
If your computer is too weak and puny, but you can add packages, consider the data.table
package
library(data.table)
newFile <- fread("course_4_proj_1.txt", na.strings = "?")
newFile[,DateTime := as.Date(paste(Date, Time), format = "%d/%m/%Y %H:%M:%S")]
and there are further optimizations one can use. I found answers here useful.
One can then subset the data.frame in the normal way. Here is a method using dplyr
library(dplyr)
subsetted <- filter(newFile, DateTime >= as.Date("2006-02-01 00:00:00"), DateTime < as.Date("2006-02-03 00:00:00"))
回答2:
The standard R read.table
functions always read in the entire data set first. You might consider filtering the file some other way before reading into R, or using a package like sqldf
which has a read.csv.sql
function that can filter data on import. I haven't tested it with date classes yet.
来源:https://stackoverflow.com/questions/24006475/subsetting-data-based-on-a-date-range-in-r