I want to append a row of sums for each Reg like this
Reg Res Pop
1 Total 1000915
2 A Urban 500414
3 A Rural 500501
4 Total 999938
5 B Urban 499922
6 B Rural 500016
7 Total 1000912
8 C Urban 501638
9 C Rural 499274
10 Total 999629
11 D Urban 499804
12 D Rural 499825
13 Total 1000303
14 E Urban 499917
15 E Rural 500386
MWE is below:
Reg <- rep(LETTERS[1:5], each = 2)
Res <- rep(c("Urban", "Rural"), times = 5)
set.seed(12345)
Pop <- rpois(n = 10, lambda = 500000)
df <- data.frame(Reg, Res, Pop)
df
Reg Res Pop
1 A Urban 500414
2 A Rural 500501
3 B Urban 499922
4 B Rural 500016
5 C Urban 501638
6 C Rural 499274
7 D Urban 499804
8 D Rural 499825
9 E Urban 499917
10 E Rural 500386
df %>%
group_by(Reg) %>%
summarise(Total = sum(Pop))
# A tibble: 5 x 2
Reg Total
<fctr> <int>
1 A 1000915
2 B 999938
3 C 1000912
4 D 999629
5 E 1000303
Edited
I would like to have both dplyr
and data.table
solutions.
You can add an extra Res column to the summary and then bind_rows
with the original data frame:
df %>%
group_by(Reg) %>%
summarise(Pop = sum(Pop), Res = 'Total') %>%
bind_rows(df) %>%
arrange(Reg)
# A tibble: 15 x 3
# Reg Pop Res
# <chr> <int> <chr>
# 1 A 1000915 Total
# 2 A 500414 Urban
# 3 A 500501 Rural
# 4 B 999938 Total
# 5 B 499922 Urban
# 6 B 500016 Rural
# 7 C 1000912 Total
# 8 C 501638 Urban
# 9 C 499274 Rural
#10 D 999629 Total
#11 D 499804 Urban
#12 D 499825 Rural
#13 E 1000303 Total
#14 E 499917 Urban
#15 E 500386 Rural
A corresponding data.table
solution:
dt <- setDT(df)
rbindlist(list(dt[, .(Pop = sum(Pop), Res = 'Total'), Reg], dt), use.names = TRUE)
lapply(split(df, df$Reg),
function(a) rbind(data.frame(Reg = a$Reg[1],
Res = "Total",
Pop = sum(a$Pop)),
a))
$A
Reg Res Pop
1 A Total 1000915
2 A Urban 500414
3 A Rural 500501
$B
Reg Res Pop
1 B Total 999938
3 B Urban 499922
4 B Rural 500016
$C
Reg Res Pop
1 C Total 1000912
5 C Urban 501638
6 C Rural 499274
$D
Reg Res Pop
1 D Total 999629
7 D Urban 499804
8 D Rural 499825
$E
Reg Res Pop
1 E Total 1000303
9 E Urban 499917
10 E Rural 500386
You could convert the whole thing to data.frame by using do.call(rbind, ...)
if you want
Stacking and rearranging will work:
library(dplyr)
Reg <- rep(LETTERS[1:5], each = 2)
Res <- rep(c("Urban", "Rural"), times = 5)
set.seed(12345)
Pop <- rpois(n = 10, lambda = 500000)
df <- data.frame(Reg, Res, Pop, stringsAsFactors = FALSE)
sums <- df %>%
group_by(Reg) %>%
summarise(Pop = sum(Pop)) %>%
mutate(Res = "Total")
df_sums <- bind_rows(df, sums) %>%
arrange(Reg, Res)
We can use dplyr
and purrr
. This is similar to d.b's method, but the output of map_dfr
would be a data frame. So no further conversion from list to data frame is needed. Notice that I used the data_frame
function to construct the df
because for this analysis factor is not needed. df2
is the final output.
library(dplyr)
library(purrr)
df <- data_frame(Reg, Res, Pop)
df2 <- df %>%
split(.$Reg) %>%
map_dfr(~bind_rows(.x, data_frame(Reg = .x$Reg[1], Res = "Total", Pop = sum(.x$Pop))))
df2
# A tibble: 15 x 3
Reg Res Pop
<chr> <chr> <int>
1 A Urban 500414
2 A Rural 500501
3 A Total 1000915
4 B Urban 499922
5 B Rural 500016
6 B Total 999938
7 C Urban 501638
8 C Rural 499274
9 C Total 1000912
10 D Urban 499804
11 D Rural 499825
12 D Total 999629
13 E Urban 499917
14 E Rural 500386
15 E Total 1000303
Your data:
Reg <- rep(LETTERS[1:5], each = 2)
Res <- rep(c("Urban", "Rural"), times = 5)
set.seed(12345)
Pop <- rpois(n = 10, lambda = 500000)
df <- data.frame(Reg, Res, Pop)
require(dplyr)
df1 <-
df %>%
group_by(Reg) %>%
summarise(Total = sum(Pop))
My solution (note: I also send the earlier pipe to df1
):
df <- rbind(df, data.frame(Reg=df1$Reg, Res="Total", Pop=df1$Total))
df <- df[order(as.character(df$Reg), decreasing = T),]
df <- df[seq(dim(df)[1],1),]
Result:
print(df, row.names = F)
Reg Res Pop A Total 1000915 A Rural 500501 A Urban 500414 B Total 999938 B Rural 500016 B Urban 499922 C Total 1000912 C Rural 499274 C Urban 501638 D Total 999629 D Rural 499825 D Urban 499804 E Total 1000303 E Rural 500386 E Urban 499917
If you want to print them with line breaks in between the groups, without changing the data types:
for(g in unique(df$Reg)){
print(df[df$Reg==g,], row.names = F)
cat("\n")
}
Reg Res Pop A Total 1000915 A Rural 500501 A Urban 500414 Reg Res Pop B Total 999938 B Rural 500016 B Urban 499922 Reg Res Pop C Total 1000912 C Rural 499274 C Urban 501638 Reg Res Pop D Total 999629 D Rural 499825 D Urban 499804 Reg Res Pop E Total 1000303 E Rural 500386 E Urban 499917
You also requested a data.table solution. This is identical to what's above, except create df1
like this:
dt <- as.data.table(df)
df1 <- dt[,sum(Pop),by=dt$Reg]
The development version 1.10.5 of the data.table
package (see here for installation instructions) has three new functions for calculating aggregates at various levels of groupings which can be used here.
Note that OP's expected result contains contiguous row numbers 1 to 15 which suggest that the OP is expecting one data.frame or data.table rather than a list as preferred by Frank. However, we will show below that also a data.table can be printed in an eye-friendly way.
rollup()
With the new rollup()
function and ordering by Reg
library(data.table) # development version 1.10.5 as of 2015-09-10
setDT(df)
rollup(df, j = list(Pop = sum(Pop)), by = c("Reg", "Res"))[order(Reg)]
we do get
Reg Res Pop 1: A Urban 500414 2: A Rural 500501 3: A NA 1000915 4: B Urban 499922 5: B Rural 500016 6: B NA 999938 7: C Urban 501638 8: C Rural 499274 9: C NA 1000912 10: D Urban 499804 11: D Rural 499825 12: D NA 999629 13: E Urban 499917 14: E Rural 500386 15: E NA 1000303 16: NA NA 5001697
The respective totals are indicated by NA
(including a grand total). If we want to better reproduce the expected result, the grand total can be removed and NA
be replaced by Total
:
rollup(df, j = list(Pop = sum(Pop)), by = c("Reg", "Res"))[order(Reg)][
is.na(Res), Res := "Total"][!is.na(Reg)]
Reg Res Pop 1: A Urban 500414 2: A Rural 500501 3: A Total 1000915 4: B Urban 499922 5: B Rural 500016 6: B Total 999938 7: C Urban 501638 8: C Rural 499274 9: C Total 1000912 10: D Urban 499804 11: D Rural 499825 12: D Total 999629 13: E Urban 499917 14: E Rural 500386 15: E Total 1000303
Note that the Total
rows appear below the details rows which isn't fully in line with OP's expected result.
groupingsets()
With the groupingsets()
function, the aggregations can be controlled in great detail:
groupingsets(df, j = list(Pop = sum(Pop)), by = c("Reg", "Res"),
sets = list("Reg", c("Reg", "Res")))[order(Reg)][
is.na(Res), Res := "Total"][]
Reg Res Pop 1: A Total 1000915 2: A Urban 500414 3: A Rural 500501 4: B Total 999938 5: B Urban 499922 6: B Rural 500016 7: C Total 1000912 8: C Urban 501638 9: C Rural 499274 10: D Total 999629 11: D Urban 499804 12: D Rural 499825 13: E Total 1000303 14: E Urban 499917 15: E Rural 500386
Now, the Total
rows appear above the details rows and no grand total was created at all.
Nicely printed "classic" data.table
solutions
Up to now, two "classic" data.table
solutions were posted by Psidom and Hack-R.
Both could be re-written more concisely as
rbind(df[, .(Res = "Total", Pop = sum(Pop)), by = Reg], df)[order(Reg)]
The result can be printed in an "eye-friendly" way with blank lines between the groups using
rbind(df[, .(Res = "Total", Pop = sum(Pop)), by = Reg], df)[
order(Reg), {print(data.table(Reg, .SD), row.names = FALSE); cat("\n")}, by = Reg]
Reg Res Pop A Total 1000915 A Urban 500414 A Rural 500501 Reg Res Pop B Total 999938 B Urban 499922 B Rural 500016 Reg Res Pop C Total 1000912 C Urban 501638 C Rural 499274 Reg Res Pop D Total 999629 D Urban 499804 D Rural 499825 Reg Res Pop E Total 1000303 E Urban 499917 E Rural 500386
来源:https://stackoverflow.com/questions/46122047/appending-a-row-of-sums-for-each-level-of-a-factor