split huge 40000 page pdf into single pages, itextsharp, outofmemoryexception

谁说胖子不能爱 提交于 2019-12-02 19:33:37

From what I have read, it looks like when instantiating the PdfReader that you should use the constructor that takes in a RandomAccessFileOrArray object. Disclaimer: I have not tried this out myself.

iTextSharp.text.pdf.PdfReader reader = new iTextSharp.text.pdf.PdfReader(new iTextSharp.text.pdf.RandomAccessFileOrArray(@"C:\PDFFile.pdf"), null);

This is a total shot in the dark, and I haven't tested this code - it's a code extract from the 'iText In Action' book that is given as an example of how to deal with large PDF files. The code is in Java but should be fairly easy to convert -

This is the method that loads everything into memory -

PdfReader reader;
long before;
before = getMemoryUse();
reader = new PdfReader(
"HelloWorldToRead.pdf", null);
System.out.println("Memory used by the full read: "
+ (getMemoryUse() - before));

This is the memory saving way, where the document should be loaded bit-by-bit as required -

before = getMemoryUse();
reader = new PdfReader(
new RandomAccessFileOrArray("HelloWorldToRead.pdf"), null);
System.out.println("Memory used by the partial read: "
+ (getMemoryUse() - before));

You might be able to use Ghostscript directly. http://svn.ghostscript.com/ghostscript/tags/ghostscript-9.02/doc/Use.htm#One_page_per_file

For reading the recipient data pdftextstream might be a good choice.

PDF Toolkit is quite useful for these types of tasks. Haven't tried it with such a huge file yet though.

Could it work better using some other library than itextsharp?

Please try Aspose.Pdf for .NET which allows you to split the PDF into single pages or you could split the PDF to different sets of pages in various ways, either using files or memory streams. API is very simple to learn and use. It works with large PDF files having large number of pages.

Disclosure: I work as developer evangelist at Aspose.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!