I have a data.table
like:
library(data.table)
widgets <- data.table(serial_no=1:100,
color=rep_len(c(\"red\",\"green\"
This is almost a single step:
# A
widgets[,{
totwt = .N
.SD[,.(frac=.N/totwt),by=style]
},by=color]
# color style frac
# 1: red round 0.36
# 2: red pointy 0.32
# 3: red flat 0.32
# 4: green pointy 0.36
# 5: green flat 0.32
# 6: green round 0.32
# 7: blue flat 0.36
# 8: blue round 0.32
# 9: blue pointy 0.32
# 10: black round 0.36
# 11: black pointy 0.32
# 12: black flat 0.32
# B
widgets[,{
totwt = sum(weight)
.SD[,.(frac=sum(weight)/totwt),by=style]
},by=color]
# color style frac
# 1: red round 0.3466667
# 2: red pointy 0.3466667
# 3: red flat 0.3066667
# 4: green pointy 0.3333333
# 5: green flat 0.3200000
# 6: green round 0.3466667
# 7: blue flat 0.3866667
# 8: blue round 0.2933333
# 9: blue pointy 0.3200000
# 10: black round 0.3733333
# 11: black pointy 0.3333333
# 12: black flat 0.2933333
How it works: Construct your denominator for the top-level group (color
) before going to the finer group (color
with style
) to tabulate.
Alternatives. If style
s repeat within each color
and this is only for display purposes, try a table
:
# A
widgets[,
prop.table(table(color,style),1)
]
# style
# color flat pointy round
# black 0.32 0.32 0.36
# blue 0.36 0.32 0.32
# green 0.32 0.36 0.32
# red 0.32 0.32 0.36
# B
widgets[,rep(1L,sum(weight)),by=.(color,style)][,
prop.table(table(color,style),1)
]
# style
# color flat pointy round
# black 0.2933333 0.3333333 0.3733333
# blue 0.3866667 0.3200000 0.2933333
# green 0.3200000 0.3333333 0.3466667
# red 0.3066667 0.3466667 0.3466667
For B, this expands the data so that there is one observation for each unit of weight. With large data, such an expansion would be a bad idea (since it costs so much memory). Also, weight
has to be an integer; otherwise, its sum will be silently truncated to one (e.g., try rep(1,2.5) # [1] 1 1
).