R: How to group and aggregate list elements using regex?

非 Y 不嫁゛ 提交于 2019-12-12 18:34:00

问题


I want to aggregate (sum up) the following product list by groups (see below):

prods <- list("101.2000"=data.frame(1,2,3),
              "102.2000"=data.frame(4,5,6),
              "103.2000"=data.frame(7,8,9),
              "104.2000"=data.frame(1,2,3),
              "105.2000"=data.frame(4,5,6),
              "106.2000"=data.frame(7,8,9),
              "101.2001"=data.frame(1,2,3),
              "102.2001"=data.frame(4,5,6),
              "103.2001"=data.frame(7,8,9),
              "104.2001"=data.frame(1,2,3),
              "105.2001"=data.frame(4,5,6),
              "106.2001"=data.frame(7,8,9))
test= list("100.2000"=data.frame(2,3,5),
           "100.2001"=data.frame(4,5,6))
names <- c("A", "B", "C")
prods <- lapply(prods, function (x) {colnames(x) <- names; return(x)})

Each element of the product list (prods) has a name combination of the product number and the year (e.g. 101.2000 --> 101 = prod nr. and 2000 = year). And the groups only contain product numbers for the aggregation.

group1 <- c(101, 106)
group2 <- c(102, 104)
group3 <- c(105, 103)

My expected result, shows the aggregated product groups by year:

$group1.2000
  A  B  C
1 8 10 12

$group2.2000
  A B C
1 5 7 9

$group3.2000
   A  B  C
1 11 13 15

$group1.2001
  A  B  C
1 8 10 12

$group2.2001
  A B C
1 5 7 9

$group3.2001
   A  B  C
1 11 13 15

So far, I tried this way: First I decomposed the names of prods into product numbers:

prodnames <- names(prods)
prodnames_sub <- gsub("\\..*.","", prodnames)

And then I tried to aggregate using lapply:

lapply(prods, function(x) aggregate( ...  , FUN = sum)

However, I didn't find how to implement the previous product numbers in the aggregation function. Ideas? Thanks


回答1:


Here are two approaches. No packages are used in either one.

1) Using lists Create a two column data.frame S from the groups whose columns are the products (value column) and associated groups (ind column). Create the list to split by, By. In code to produce By, sub("\\.*", "", names(prods)) extracts the products and match is then used to find the associated group. sub("\\..*", "", names(prods)) extracts the year. Next perform the split and lapply over it to run the summations. The two components of By (group and year) can be reversed to change the order of the output, if desired.

S <- stack(list(group1 = group1, group2 = group2, group3 = group3))
By <- list(group = S$ind[match(sub("\\..*", "", names(prods)), S$values)],
           year = sub(".*\\.", "", names(prods)))
lapply(split(prods, By), function(x) colSums(do.call(rbind, x)))

2) Using data.frames Convert the groups and prods each to a data frame, merge them, perform an aggregate and split back into a list. The output is the same as requested except for order. (Reverse the two right hand variables in the aggregate formula to get the order shown in the question but that will also reverse the two parts of each component name in he output list.)

S <- stack(list(group1 = group1, group2 = group2, group3 = group3))

DF0 <- do.call(rbind, prods)
DF <- cbind(do.call(rbind, strsplit(rownames(DF0), ".", fixed = TRUE)), DF0)

M <- merge(DF, S, all.x = TRUE, by = 1)
Ag <- aggregate(cbind(A, B, C) ~ ind + `2`, M, sum)
lapply(split(Ag, paste(Ag[[1]], Ag[[2]], sep = ".")), "[", 3:5)

giving:

$group1.2000
  A  B  C
1 8 10 12

$group1.2001
  A  B  C
4 8 10 12

$group2.2000
  A B C
2 5 7 9

$group2.2001
  A B C
5 5 7 9

$group3.2000
   A  B  C
3 11 13 15

$group3.2001
   A  B  C
6 11 13 15


来源:https://stackoverflow.com/questions/33837293/r-how-to-group-and-aggregate-list-elements-using-regex

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!