Export fixed range of columns from dataframe to pdf (one “slice” per sheet)

别等时光非礼了梦想. 提交于 2019-12-13 04:09:13

问题


R beginner here with a probably rather simple question.

I have a dataframe as seen below in the reproducible sample. What I want to do is export it to pdf, so that I always have 4 columns (Country, Fruit, Start, End) per pdf sheet. So in this case I would need a pdf file with 3 pages, one for Apples, Bananas and Cheeries.

In reality I have 50 "Fruits", hence a loop of sorts would be useful, preferably using grid.table or similar due to it's nicely formatted output tables.

Country <- c("AUS", "AT", "BE", "CHN", "US")
Fruit <- c(rep("Apples", 5))
Start <- c(1999, 1998, 1987, 1988, 1997)
End <- c(2014, 2014, 2015, 2013, 2014)
Country.1 <- c("AUS", "AT", "BE", "CHN", "US")
Fruit.1 <- c(rep("Bananas", 5))
Start.1 <- c(1998, 1999, 1987, 1988, 1999)
End.1 <- c(2014, 2014, 2014, 2014, 2015)
Country.2 <- c("AUS", "AT", "BE", "CHN", "US")
Fruit.2 <- c(rep("Cherries", 5))
Start.2 <- c(1981, 1988, 1987, 1977, 1999)
End.2 <- c(2014, 2014, 2015, 2013, 2014)

mydf <- data.frame(Country, Fruit, Start, End, Country.1, Fruit.1, Start.1, End.1, Country.2, Fruit.2, Start.2, End.2)

I tried to work with the expression mydf[c(TRUE, seq(FALSE, 4))], and tried to incorporate it in grid.table (from gridExtra) but couldn't make it work. I would hugely appreciate any help.

Furthermore (and not as important), I'd like to ask you to comment on the structure of this data. The way this dataframe is set up I basically have lots of duplicate columns (Country). I doubt that this is the ideal way to work with data in R, and would highly appreciate and any comments that would help me improve my R skills in this regard, since I want to improve on handling big (mostly panel) datasets in R.

Edit: I assume I could delete the duplicate Country columns, as they do not change. Edit2: Below a small sample of data representing my initial data structure. x1-x10 are the "Fruits", y1 - y12 are analogous to the countries of the previous sample.

Fruits <- c("x1", "x2", "x3", "x4", "x5", "x6", "x7", "x8", "x9", "x10",  "x11", "x12")
y1 <- c(round(runif(12, 1980, 1999), digits = 0))
y2 <- c(round(runif(12, 1980, 1999), digits = 0))
y3 <- c(round(runif(12, 1980, 1999), digits = 0))
y4 <- c(round(runif(12, 1980, 1999), digits = 0))
y5 <- c(round(runif(12, 1980, 1999), digits = 0))
y6 <- c(round(runif(12, 1980, 1999), digits = 0))
y7 <- c(round(runif(12, 1980, 1999), digits = 0))
y8 <- c(round(runif(12, 1980, 1999), digits = 0))
y9 <- c(round(runif(12, 1980, 1999), digits = 0))
startdf <- data.frame(Fruits, y1, y2, y3, y4, y5, y6, y7, y8, y9)
y1 <- c(round(runif(12, 2012, 2015), digits = 0))
y2 <- c(round(runif(12, 2012, 2015), digits = 0))
y3 <- c(round(runif(12, 2012, 2015), digits = 0))
y4 <- c(round(runif(12, 2012, 2015), digits = 0))
y5 <- c(round(runif(12, 2012, 2015), digits = 0))
y6 <- c(round(runif(12, 2012, 2015), digits = 0))
y7 <- c(round(runif(12, 2012, 2015), digits = 0))
y8 <- c(round(runif(12, 2012, 2015), digits = 0))
y9 <- c(round(runif(12, 2012, 2015), digits = 0))
enddf <- data.frame(Fruits, y1, y2, y3, y4, y5, y6, y7, y8, y9)

回答1:


First of all you're overwriting the Start and End columns in your sample so I've also added numbers to them and started the numbering at 1:

Country <- c("AUS", "AT", "BE", "CHN", "US")
Fruit.1 <- c(rep("Apples", 5))
Start.1 <- c(1999, 1998, 1987, 1988, 1997)
End.1 <- c(2014, 2014, 2015, 2013, 2014)
Fruit.2 <- c(rep("Bananas", 5))
Start.2 <- c(1998, 1999, 1987, 1988, 1999)
End.2 <- c(2014, 2014, 2014, 2014, 2015)
Fruit.3 <- c(rep("Cherries", 5))
Start.3 <- c(1981, 1988, 1987, 1977, 1999)
End.3 <- c(2014, 2014, 2015, 2013, 2014)

mydf <- data.frame(Country, Fruit.1, Start.1, End.1, Fruit.2, Start.2, End.2, Fruit.3, Start.3, End.3)

Notice how I only put the country column once, at the beginning of the data frame. There is really no need to have duplicated columns in a data frame as you can always pull them out by name, number or logical value.

I've made a simple loop based on a regular expression to print each separate part on a new page:

library(gridExtra)
pdf("data_output.pdf", height=11, width=8.5)
for(i in 1:3) {
  plot.new()
  regex <- paste0("Country|",i)
  tempdf <- mydf[grepl(regex,names(mydf))]
  grid.table(tempdf)
}
dev.off()

The regular expression pulls out all the columns that either have "Country" or (| = OR) the digit "i" in the column name. The plot.new() command ensures that grid.table starts a new page.

ADDON: how I would organize your data.

The above writeup solves your original problem. Really though, the structure of your data lends itself perfectly to making a data frame composed of factors and numbers and using the powerful split-apply-combine framework from either base R or the plyr/dplyr packages.

Your data frame would ideally be reorganized as follows:

Country <- rep(c("AUS", "AT", "BE", "CHN", "US"), 3)
Fruit <- rep(c("Apples", "Bananas", "Cherries") , each = 5)
Start <- c(1999,1998,1987,1988,1997,1998,1999,1987,1988,1999,1981,1988,1987,1977,1999)
End <- c(2014,2014,2015,2013,2014,2014,2014,2014,2014,2015,2014,2014,2015,2013,2014)

mydf <- data.frame(Country, Fruit, Start, End)

Then you could either use base R to loop over the different type of fruits

pdf("data_output.pdf", height=11, width=8.5)
for(fruit in levels(mydf$Fruit)) {
  tempdf <- subset(mydf, Fruit == fruit)
  plot.new()
  grid.table(tempdf)
}

or, you could use a lapply call like in Tensibai 's answer,

or, you can use the by function

by(mydf, mydf$Fruit, function(x) {plot.new(); grid.table(x)})

or, you could use the ddply function of the plyr package to accomplish the same,

library(plyr)
ddply(mydf, .(Fruit), function(x) { plot.new(); grid.table(x) })

The advantage of the last two methods is that you can easily add other columns (e.g., Continent) and print more complicated subsets of your data, like Fruit = Apples and Continent = North America, without having to wrap everything in more loops.




回答2:


For your data organization I would go this way:

# Known Fixed list
Country <- c("AUS", "AT", "BE", "CHN", "US")
Fruits <- c("Apples","Bananas","Cherries")

# Build list of variables entry
Start <- list() # Init the list
End <- list() # Init the list
# Fill them up (1 being first fruit in Fruits)
Start[[1]] <- c(1999, 1998, 1987, 1988, 1997)
End[[1]] <- c(2014, 2014, 2015, 2013, 2014)
Start[[2]] <- c(1998, 1999, 1987, 1988, 1999)
End[[2]] <- c(2014, 2014, 2014, 2014, 2015)
Start[[3]] <- c(1981, 1988, 1987, 1977, 1999)
End[[3]] <- c(2014, 2014, 2015, 2013, 2014)

#Build a list of data frame by iterating over the Fruits 
mydfs <- lapply(seq_along(Fruits), function(x) { data.frame(Country,Fruit = rep(Fruits[x],length(Start[[x]])),Start = Start[[x]],End = End[[x]]) } )

Which gives:

> mydfs
[[1]]
  Country  Fruit Start  End
1     AUS Apples  1999 2014
2      AT Apples  1998 2014
3      BE Apples  1987 2015
4     CHN Apples  1988 2013
5      US Apples  1997 2014

[[2]]
  Country   Fruit Start  End
1     AUS Bananas  1998 2014
2      AT Bananas  1999 2014
3      BE Bananas  1987 2014
4     CHN Bananas  1988 2014
5      US Bananas  1999 2015

[[3]]
  Country    Fruit Start  End
1     AUS Cherries  1981 2014
2      AT Cherries  1988 2014
3      BE Cherries  1987 2015
4     CHN Cherries  1977 2013
5      US Cherries  1999 2014

And you can access any of this df like this:

> mydfs[[ as.factor(Fruits)[Fruits == "Bananas"] ]]
  Country   Fruit Start  End
1     AUS Bananas  1998 2014
2      AT Bananas  1999 2014
3      BE Bananas  1987 2014
4     CHN Bananas  1988 2014
5      US Bananas  1999 2015

So you can print any of them individually with grid.table(mydfs[[ as.factor(Fruits)[Fruits == "Bananas"] ]] or all of them through a lapply call as show by @user28400

pdf("data_output.pdf", height=11, width=8.5)
lapply( mydfs, function(x) { 
                 plot.new()
                 grid.table(x)
               })
dev.off()

Updated construction of mydfs per question update and comments.

nbYears <- length(startdf[1,-1])
#Build a list of data frame by iterating over the Fruits 
mydfs <- lapply( Fruits, # iterate over the Fruits to use their names
                 function(x) {  
                   lVec <- startdf$Fruits == x # build the logical vector (shorten the subset syntax later, not needed for perf)
                   data.frame( 
                     Country, 
                     Fruit = rep(x,nbYears), 
                     Start = unlist(startdf[lVec,-1]), # Get the subset of the df from the logical vector, omit the first column, and cast to vector instead of data.frame
                     End = unlist(enddf[lVec,-1]) # same as above
                   ) 
                 }
               )

This assume Country will match the number of years present in the stardf and enddf.



来源:https://stackoverflow.com/questions/32203753/export-fixed-range-of-columns-from-dataframe-to-pdf-one-slice-per-sheet

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!