I want to summarise the percentage of people that have been treated BY region.
I have created a dummy dataset for this purpose:
id <- seq(1:1000)
For completeness, here's how you can do it using ddply() from plyr:
library(plyr)
ddply(d[!is.na(d$id),],.(region),summarize,
N = length(region),
prop=mean(treatment==1))
# region N prop
# 1 A 200 0.5
# 2 B 200 0.5
# 3 C 200 0.5
# 4 D 200 0.5
# 5 E 200 0.5
This assumes that you want to deal with the NA values in id by removing the observation.