问题
I'm working with a large list that contains 450 dataframes. I'll make an example of the names of the dataframes:
ALL_SM51_SE1_hourly, ALL_SM201_SE1_hourly, ALL_SM501_SE1_hourly
ALL_SM51_SE2_hourly, ALL_SM201_SE2_hourly, ALL_SM501_SE2_hourly
...................................................................
ALL_SM51_SE150_hourly, ALL_SM201_SE150_hourly, ALL_SM501_SE150_hourly
The dataframes contain measured soil moisture data at different depths (5cm, 20cm, 50cm, represented by "SM51, SM201, SM501" in the filenames) and there are 150 sensors (represented by the "SE1, SE2, SE3, ..." in the filename) which is why I have 450 dataframes that are stored in a list.
What I would like to do: I want to create a new list (make a subset) for each sensor that then contains 3 elements. So I wanna have a list for SE1, SE2, SE3, ..., SE150 with the corresponding measuring depths.
I already searched for an appropriate answer to my question but I only found answers that subset data by specific values but I want to subset by the filenames.
Does anyone know how to do this?
回答1:
Using regular expressions you may identify unique sensors un.se
which you can paste
to new.names
. The original list lst
then can be split
into unique sensors, ordered
and converted into data.frame
s.
un.se <- gsub(".*SE(\\d+).*", "\\1", names(lst))
new.names <- paste0("SE", unique(un.se))
tmp <- setNames(split(lst, un.se), paste0("SE", unique(un.se)))
res <- lapply(tmp, function(x) {
nm <- gsub(".*SM(\\d+).*", "\\1", names(x))
setNames(lapply(x[order(nm)], data.frame), paste0("d", gsub("1$", "", nm)))
})
Explanation gsub
-regex:
In the regex .*
looks for any "character-until", then we have SE
literally. Now we use grouping inside parentheses (
)
, where we look with \\d+
for one or more occurrences of a number or d
igit. In the second gsub
-argument \\1
does a back-reference on the first group (that in the parentheses) to replace the whole string. E.g. resulting un.se
is the number found after each SE
in each string (see: https://regex101.com/r/zuO8Ts/1; and note that we need double escapes \\
in R).
This lists each sensor with data frames for each depth in sublists.
Result
res
# $SE1
# $SE1$d5
# x1 x2 x3
# 1 1 2 3
#
# $SE1$d20
# x1 x2 x3
# 1 1 2 3
#
# $SE1$d50
# x1 x2 x3
# 1 1 2 3
#
#
# $SE2
# $SE2$d5
# x1 x2 x3
# 1 1 2 3
#
# $SE2$d20
# x1 x2 x3
# 1 1 2 3
#
# $SE2$d50
# x1 x2 x3
# 1 1 2 3
Toy data
lst <- list(ALL_SM51_SE1_hourly = list(x1 = 1, x2 = 2, x3 = 3), ALL_SM201_SE1_hourly = list(
x1 = 1, x2 = 2, x3 = 3), ALL_SM501_SE1_hourly = list(x1 = 1,
x2 = 2, x3 = 3), ALL_SM51_SE2_hourly = list(x1 = 1, x2 = 2,
x3 = 3), ALL_SM201_SE2_hourly = list(x1 = 1, x2 = 2, x3 = 3),
ALL_SM501_SE2_hourly = list(x1 = 1, x2 = 2, x3 = 3))
来源:https://stackoverflow.com/questions/60926815/subset-data-in-a-large-list-based-on-filename-of-the-dataframes-in-the-list