问题
I want to produce a pdf which shows multiple graphs, one for each NetworkTrackingPixelId.
I have a data frame similar to this:
> head(data)
NetworkTrackingPixelId Name Date Impressions
1 2421 Rubicon RTB 2014-02-16 168801
2 2615 Google RTB 2014-02-16 1215235
3 3366 OpenX RTB 2014-02-16 104419
4 3606 AppNexus RTB 2014-02-16 170757
5 3947 Pubmatic RTB 2014-02-16 68690
6 4299 Improve Digital RTB 2014-02-16 701
I was thinking to use a script similar to the one below:
# create a vector which stores the NetworkTrackingPixelIds
tp <- data %.%
group_by(NetworkTrackingPixelId) %.%
select(NetworkTrackingPixelId)
# create a for loop to print the line graphs
for (i in tp) {
print(ggplot(data[which(data$NetworkTrackingPixelId == i), ], aes(x = Date, y = Impressions)) + geom_point() + geom_line())
}
I was expecting this command to produce many graphs, one for each NetworkTrackingPixelId. Instead the result is an unique graph which aggregate all the NetworkTrackingPixelIds.
Another thing I've noticed is that the variable tp is not a real vector.
> is.vector(tp)
[1] FALSE
Even if I force it..
tp <- as.vector(data %.%
group_by(NetworkTrackingPixelId) %.%
select(NetworkTrackingPixelId))
> is.vector(tp)
[1] FALSE
> str(tp)
Classes ‘grouped_df’, ‘tbl_df’, ‘tbl’ and 'data.frame': 1397 obs. of 1 variable:
$ NetworkTrackingPixelId: int 2421 2615 3366 3606 3947 4299 4429 4786 6046 6286 ...
- attr(*, "vars")=List of 1
..$ : symbol NetworkTrackingPixelId
- attr(*, "drop")= logi TRUE
- attr(*, "indices")=List of 63
..$ : int 24 69 116 162 205 253 302 351 402 454 ...
..$ : int 1 48 94 140 184 232 281 330 380 432 ...
[I've cut a bit this output]
- attr(*, "group_sizes")= int 29 29 2 16 29 1 29 29 29 29 ...
- attr(*, "biggest_group_size")= int 29
- attr(*, "labels")='data.frame': 63 obs. of 1 variable:
..$ NetworkTrackingPixelId: int 8799 2615 8854 8869 4786 7007 3947 9109 9126 9137 ...
..- attr(*, "vars")=List of 1
.. ..$ : symbol NetworkTrackingPixelId
回答1:
Since I don't have your dataset, I will use the mtcars dataset to illustrate how to do this using dplyr and data.table. Both packages are the finest examples of the split-apply-combine paradigm in rstats. Let me explain:
Step 1 Split data by gear
dplyruses the functiongroup_bydata.tableuses argumentby
Step 2: Apply a function
dplyrusesdoto which you can pass a function that uses the pieces x.data.tableinterprets the variables to the function in context of each piece.
Step 3: Combine
There is no combine step here, since we are saving the charts created to file.
library(dplyr)
mtcars %.%
group_by(gear) %.%
do(function(x){ggsave(
filename = sprintf("gear_%s.pdf", unique(x$gear)), qplot(wt, mpg, data = x)
)})
library(data.table)
mtcars_dt = data.table(mtcars)
mtcars_dt[,ggsave(
filename = sprintf("gear_%s.pdf", unique(gear)), qplot(wt, mpg)),
by = gear
]
UPDATE: To save all files into one pdf, here is a quick solution.
plots = mtcars %.%
group_by(gear) %.%
do(function(x) {
qplot(wt, mpg, data = x)
})
pdf('all.pdf')
invisible(lapply(plots, print))
dev.off()
回答2:
I recently had a project that required producing a lot of individual pngs for each record. I found I got a huge speed up doing some pretty simple parallelization. I am not sure if this is more performant than the dplyr or data.table technique but it may be worth trying. I saw a huge speed bump:
require(foreach)
require(doParallel)
workers <- makeCluster(4)
registerDoParallel(workers)
foreach(i = seq(1, length(mtcars$gear)), .packages=c('ggplot2')) %dopar% {
j <- qplot(wt, mpg, data = mtcars[i,])
png(file=paste(getwd(), '/images/',mtcars[i, c('gear')],'.png', sep=''))
print(j)
dev.off()
}
回答3:
I think you would be better off writing a function for plotting, then using lapply for every Network Tracking Pixel.
For example, your function might look like:
plot.function <- function(ntpid){
sub = subset(dataset, dataset$networktrackingpixelid == ntpid)
ggobj = ggplot(data=sub, aes(...)) + geom...
ggsave(filename=sprintf("%s.pdf", ntpid))
}
It would be helpful for you to put a reproducible example, but I hope this works! Not sure about the vector issue though..
Cheers!
回答4:
Unless I'm missing something, generating plots by a subsetting variable is very simple. You can use split(...) to split the original data into a list of data frames by NetworkTrackingPixelId, and then pass those to ggplot using lapply(...). Most of the code below is just to crate a sample dataset.
# create sample data
set.seed(1)
names <- c("Rubicon","Google","OpenX","AppNexus","Pubmatic")
dates <- as.Date("2014-02-16")+1:10
df <- data.frame(NetworkTrackingPixelId=rep(1:5,each=10),
Name=sample(names,50,replace=T),
Date=dates,
Impressions=sample(1000:10000,50))
# end create sample data
pdf("plots.pdf")
lapply(split(df,df$NetworkTrackingPixelId),
function(gg) ggplot(gg,aes(x = Date, y = Impressions)) +
geom_point() + geom_line()+
ggtitle(paste("NetworkTrackingPixelId:",gg$NetworkTrackingPixelId)))
dev.off()
This generates a pdf containing 5 plots, one for each NetworkTrackingPixelId.
来源:https://stackoverflow.com/questions/22464570/arrange-multiple-graphs-using-a-for-loop-in-ggplot2