问题
its pretty hard to find a title for my question because its very specific.
My problem is: I have around 9000 files of data collected over different periods. The filenames contain that periods and I only want to load that files into R, that contain at least 17/18 years of data collection.
I created a testlist to show what I mean:
list = c("AT0ACH10000700100dymax.1-1-1993.31-12-2003",
"AT0ILL10000700500dymax.1-1-1990.31-12-2011",
"AT0PIL10000700500dymax.1-1-1992.31-12-2011",
"AT0SON10000700100dymax.1-1-1990.31-12-2011",
"AT0STO10000700100dymax.1-1-1992.31-12-2006",
"AT0VOR10000700500dymax.1-1-1991.31-12-2011",
"AT110020000700100dymax.1-1-1993.31-12-2008",
"AT2HE190000700100dymax.1-1-1993.31-12-2000",
"AT2KA110000700500dymax.1-1-1991.31-12-2010",
"AT2KA410000700500dymax.1-1-1991.31-12-2011")
These are the filenames. And now I want to extract all filenames that contain measurements that are at least 18 years long. For example the 1st file should be taken out because the periode is too short, the 2nd one is fine. So I have to create something that either compares the dates (only the years) or something like startyear + 18.
Oh and the file names dont have the same length! This is only an example.
I have no clue how to do that. Can somebody please help?
回答1:
Assuming the dates are always separated by ".", you can use string split. Here's an example getting the time difference in days.
split_list = strsplit(list, split=".", fixed=TRUE)
from = unlist(lapply(split_list, "[[", 2))
to = unlist(lapply(split_list, "[[", 3))
from = as.POSIXct(from, format="%d-%m-%Y")
to = as.POSIXct(to, format="%d-%m-%Y")
difftime(to, from, "days")
To get the time difference in years, there's a few different solutions you can use. Here's two solutions:
R: How to calculate the difference in years between a date and a year
R get date difference in years (floating point)
回答2:
Alternative solution with some assumptions but getting cleanly at the desired output.
year_to <- as.integer(sub(".*([0-9]{4}$)", "\\1", list))
year_from <- as.integer(sub(".*-([0-9]{4})\\..*", "\\1", list))
# Assume all "from" dates start on Jan 01 and "to" dates end Dec 31
# Then the difference is
diff <- year_to - year_from + 1
diff >= 18
FALSE TRUE TRUE TRUE FALSE TRUE FALSE FALSE TRUE TRUE
来源:https://stackoverflow.com/questions/47521355/r-how-to-choose-files-by-dates-in-file-names