问题
I need to merge N PDF files into one. I create a blank file first
byte[] pdfBytes = null;
var ms = new MemoryStream();
var doc = new iTextSharp.text.Document();
var cWriter = new PdfCopy(doc, ms);
Later I cycle through html strings array
foreach (NBElement htmlString in someElement.Children())
{
byte[] msTempDoc = getPdfDocFrom(htmlString.GetString(), cssString.GetString());
addPagesToPdf(cWriter, msTempDoc);
}
In getPdfDocFrom I create pdf file using XMLWorkerHelper and return it as byte array
private byte[] getPdfDocFrom(string htmlString, string cssString)
{
var tempMs = new MemoryStream();
byte[] tempMsBytes;
var tempDoc = new iTextSharp.text.Document();
var tempWriter = PdfWriter.GetInstance(tempDoc, tempMs);
tempDoc.Open();
using (var msCss = new MemoryStream(System.Text.Encoding.UTF8.GetBytes(cssString)))
{
using (var msHtml = new MemoryStream(System.Text.Encoding.UTF8.GetBytes(htmlString)))
{
//Parse the HTML
iTextSharp.tool.xml.XMLWorkerHelper.GetInstance().ParseXHtml(tempWriter, tempDoc, msHtml, msCss);
tempMsBytes = tempMs.ToArray();
}
}
tempDoc.Close();
return tempMsBytes;
}
Later on I try to add pages from this PDF file to the blank one.
private static void addPagesToPdf(PdfCopy mainDocWriter, byte[] sourceDocBytes)
{
using (var msOut = new MemoryStream())
{
PdfReader reader = new PdfReader(new MemoryStream(sourceDocBytes));
int n = reader.NumberOfPages;
PdfImportedPage page;
for (int i = 1; i <= n; i++)
{
page = mainDocWriter.GetImportedPage(reader, i);
mainDocWriter.AddPage(page);
}
}}
It breaks when it tries to create a PdfReader from the byte array I pass to the function. "Rebuild failed: trailer not found.; Original message: PDF startxref not found."
I used another library to work with PDF before. I passed 2 PdfDocuments as an objects and just added pages from one to another in cycle. It didn't support Css though, so I had to switch to ITextSharp.
I don't quite get the difference between PdfWriter and PdfCopy.
回答1:
There a logical error in your code. When you create a document from scratch as is done in the getPdfDocFrom() method, the document isn't complete until you've triggered the Close() method. In this Close() method, a trailer is created as well as a cross-reference (xref) table. The error tells you that those are missing.
Indeed, you do call the Close() method:
tempDoc.Close();
But by the time you Close() the document, it's too late: you have already created the tempMsBytes array. You need to create that array after you close the document.
Edit: I don't know anything about C#, but if MemoryStream clears its buffer after closing it, you could use mainDocWriter.CloseStream = false; so that the MemoryStream isn't closed when you close the document.
In Java, it would be a bad idea to set the "close stream" parameter to false. When I read the answers to the question Create PDF in memory instead of physical file I see that C# probably doesn't always require this extra line.
Remark: merging files by adding PdfImportedPage instances to a PdfWriter is an example of bad taste. If you are using iTextSharp 5 or earlier, you should use PdfCopy or PdfSmartCopy to do that. If you use PdfWriter, you throw away a lot of information (e.g. link annotations).
来源:https://stackoverflow.com/questions/40401695/merging-n-pdf-files-created-from-html-using-itextsharp-to-another-blank-pdf-fi