Appending a row of sums for each level of a factor

痞子三分冷 提交于 2019-12-06 15:01:27

You can add an extra Res column to the summary and then bind_rows with the original data frame:

df %>%
    group_by(Reg) %>%
    summarise(Pop = sum(Pop), Res = 'Total') %>%
    bind_rows(df) %>% 
    arrange(Reg)

# A tibble: 15 x 3
#     Reg     Pop   Res
#   <chr>   <int> <chr>
# 1     A 1000915 Total
# 2     A  500414 Urban
# 3     A  500501 Rural
# 4     B  999938 Total
# 5     B  499922 Urban
# 6     B  500016 Rural
# 7     C 1000912 Total
# 8     C  501638 Urban
# 9     C  499274 Rural
#10     D  999629 Total
#11     D  499804 Urban
#12     D  499825 Rural
#13     E 1000303 Total
#14     E  499917 Urban
#15     E  500386 Rural

A corresponding data.table solution:

dt <- setDT(df)
rbindlist(list(dt[, .(Pop = sum(Pop), Res = 'Total'), Reg], dt), use.names = TRUE)
lapply(split(df, df$Reg),
       function(a) rbind(data.frame(Reg = a$Reg[1],
                                    Res = "Total",
                                    Pop = sum(a$Pop)),
                         a))
$A
  Reg   Res     Pop
1   A Total 1000915
2   A Urban  500414
3   A Rural  500501

$B
  Reg   Res    Pop
1   B Total 999938
3   B Urban 499922
4   B Rural 500016

$C
  Reg   Res     Pop
1   C Total 1000912
5   C Urban  501638
6   C Rural  499274

$D
  Reg   Res    Pop
1   D Total 999629
7   D Urban 499804
8   D Rural 499825

$E
   Reg   Res     Pop
1    E Total 1000303
9    E Urban  499917
10   E Rural  500386

You could convert the whole thing to data.frame by using do.call(rbind, ...) if you want

Stacking and rearranging will work:

library(dplyr)

Reg <- rep(LETTERS[1:5], each = 2)
Res <- rep(c("Urban", "Rural"), times = 5)
set.seed(12345)
Pop <- rpois(n = 10, lambda = 500000)
df <- data.frame(Reg, Res, Pop, stringsAsFactors = FALSE)


sums <- df %>%
  group_by(Reg) %>%
  summarise(Pop = sum(Pop)) %>%
  mutate(Res = "Total")

df_sums <- bind_rows(df, sums) %>% 
  arrange(Reg, Res)

We can use dplyr and purrr. This is similar to d.b's method, but the output of map_dfr would be a data frame. So no further conversion from list to data frame is needed. Notice that I used the data_frame function to construct the df because for this analysis factor is not needed. df2 is the final output.

library(dplyr)
library(purrr)

df <- data_frame(Reg, Res, Pop)

df2 <- df %>%
  split(.$Reg) %>%
  map_dfr(~bind_rows(.x, data_frame(Reg = .x$Reg[1], Res = "Total", Pop = sum(.x$Pop))))

df2 
# A tibble: 15 x 3
     Reg   Res     Pop
   <chr> <chr>   <int>
 1     A Urban  500414
 2     A Rural  500501
 3     A Total 1000915
 4     B Urban  499922
 5     B Rural  500016
 6     B Total  999938
 7     C Urban  501638
 8     C Rural  499274
 9     C Total 1000912
10     D Urban  499804
11     D Rural  499825
12     D Total  999629
13     E Urban  499917
14     E Rural  500386
15     E Total 1000303

Your data:

Reg <- rep(LETTERS[1:5], each = 2)
Res <- rep(c("Urban", "Rural"), times = 5)
set.seed(12345)
Pop <- rpois(n = 10, lambda = 500000)
df  <- data.frame(Reg, Res, Pop)

require(dplyr)
df1 <- 
df %>%
  group_by(Reg) %>%
  summarise(Total = sum(Pop))

My solution (note: I also send the earlier pipe to df1):

df <- rbind(df, data.frame(Reg=df1$Reg, Res="Total", Pop=df1$Total))

df <- df[order(as.character(df$Reg), decreasing = T),]
df <- df[seq(dim(df)[1],1),]

Result:

print(df, row.names = F)
 Reg   Res     Pop
   A Total 1000915
   A Rural  500501
   A Urban  500414
   B Total  999938
   B Rural  500016
   B Urban  499922
   C Total 1000912
   C Rural  499274
   C Urban  501638
   D Total  999629
   D Rural  499825
   D Urban  499804
   E Total 1000303
   E Rural  500386
   E Urban  499917

If you want to print them with line breaks in between the groups, without changing the data types:

for(g in unique(df$Reg)){
  print(df[df$Reg==g,], row.names = F)
  cat("\n")
}
 Reg   Res     Pop
   A Total 1000915
   A Rural  500501
   A Urban  500414

 Reg   Res    Pop
   B Total 999938
   B Rural 500016
   B Urban 499922

 Reg   Res     Pop
   C Total 1000912
   C Rural  499274
   C Urban  501638

 Reg   Res    Pop
   D Total 999629
   D Rural 499825
   D Urban 499804

 Reg   Res     Pop
   E Total 1000303
   E Rural  500386
   E Urban  499917

You also requested a data.table solution. This is identical to what's above, except create df1 like this:

dt  <- as.data.table(df)
df1 <- dt[,sum(Pop),by=dt$Reg]

The development version 1.10.5 of the data.table package (see here for installation instructions) has three new functions for calculating aggregates at various levels of groupings which can be used here.

Note that OP's expected result contains contiguous row numbers 1 to 15 which suggest that the OP is expecting one data.frame or data.table rather than a list as preferred by Frank. However, we will show below that also a data.table can be printed in an eye-friendly way.

rollup()

With the new rollup() function and ordering by Reg

library(data.table)   # development version 1.10.5 as of 2015-09-10
setDT(df)
rollup(df, j = list(Pop = sum(Pop)), by = c("Reg", "Res"))[order(Reg)]

we do get

    Reg   Res     Pop
 1:   A Urban  500414
 2:   A Rural  500501
 3:   A    NA 1000915
 4:   B Urban  499922
 5:   B Rural  500016
 6:   B    NA  999938
 7:   C Urban  501638
 8:   C Rural  499274
 9:   C    NA 1000912
10:   D Urban  499804
11:   D Rural  499825
12:   D    NA  999629
13:   E Urban  499917
14:   E Rural  500386
15:   E    NA 1000303
16:  NA    NA 5001697

The respective totals are indicated by NA (including a grand total). If we want to better reproduce the expected result, the grand total can be removed and NA be replaced by Total:

rollup(df, j = list(Pop = sum(Pop)), by = c("Reg", "Res"))[order(Reg)][
  is.na(Res), Res := "Total"][!is.na(Reg)]
    Reg   Res     Pop
 1:   A Urban  500414
 2:   A Rural  500501
 3:   A Total 1000915
 4:   B Urban  499922
 5:   B Rural  500016
 6:   B Total  999938
 7:   C Urban  501638
 8:   C Rural  499274
 9:   C Total 1000912
10:   D Urban  499804
11:   D Rural  499825
12:   D Total  999629
13:   E Urban  499917
14:   E Rural  500386
15:   E Total 1000303

Note that the Total rows appear below the details rows which isn't fully in line with OP's expected result.

groupingsets()

With the groupingsets() function, the aggregations can be controlled in great detail:

groupingsets(df, j = list(Pop = sum(Pop)), by = c("Reg", "Res"), 
             sets = list("Reg", c("Reg", "Res")))[order(Reg)][
               is.na(Res), Res := "Total"][]
    Reg   Res     Pop
 1:   A Total 1000915
 2:   A Urban  500414
 3:   A Rural  500501
 4:   B Total  999938
 5:   B Urban  499922
 6:   B Rural  500016
 7:   C Total 1000912
 8:   C Urban  501638
 9:   C Rural  499274
10:   D Total  999629
11:   D Urban  499804
12:   D Rural  499825
13:   E Total 1000303
14:   E Urban  499917
15:   E Rural  500386

Now, the Total rows appear above the details rows and no grand total was created at all.

Nicely printed "classic" data.table solutions

Up to now, two "classic" data.table solutions were posted by Psidom and Hack-R.

Both could be re-written more concisely as

rbind(df[, .(Res = "Total", Pop = sum(Pop)), by = Reg], df)[order(Reg)]

The result can be printed in an "eye-friendly" way with blank lines between the groups using

rbind(df[, .(Res = "Total", Pop = sum(Pop)), by = Reg], df)[
  order(Reg), {print(data.table(Reg, .SD), row.names = FALSE); cat("\n")}, by = Reg]
 Reg   Res     Pop
   A Total 1000915
   A Urban  500414
   A Rural  500501

 Reg   Res    Pop
   B Total 999938
   B Urban 499922
   B Rural 500016

 Reg   Res     Pop
   C Total 1000912
   C Urban  501638
   C Rural  499274

 Reg   Res    Pop
   D Total 999629
   D Urban 499804
   D Rural 499825

 Reg   Res     Pop
   E Total 1000303
   E Urban  499917
   E Rural  500386
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!