Example PDF language code which helps to study the official PDF specification? [closed]

拟墨画扇 提交于 2019-11-27 21:45:33

The creators of iText (a Java/C# lib to create and manipulate PDFs) published a tool called RUPS.

From the sourceforge page:

RUPS is an abbreviation for Reading and Updating PDF Syntax. RUPS is a tool built on top of iText® that allows you to look inside a PDF document and browse the different PDF objects and content streams. (Updating PDFs isn't possible yet.)

The way I helped myself to learn PDF syntax was this:

  • Looked for a tool that could de-compress PDFs (de-compress the internal streams).

  • Found qpdf, Jay Birkenbilt's commandline tool described as: "does structural, content-preserving transformations on PDF files".

  • Routinely running qpdf --qdf input.pdf decompressed-input.pdf.

  • Opening the newly created decompressed-input.pdf in a text editor.

The --qdf mode of the tool transforms the binary and ASCII elements of PDFs in a very useful way, without changing their visual page appearance (and it's very fast):

  1. Decompress previously compressed objects (exposing f.e. the PDF language source code of page element drawing operations).

  2. Also expand object streams (ObjStrm).

  3. Normalize the presentation of arrays, strings etc.

  4. Re-number objects so they start from 1 0 obj and then present them in ascending order in the file.

  5. Repair b0rken xref entries.

  6. Add comments which contain an object's original identity in the original file.

  7. Add comments for each page.

  8. ...and some more.

Looking at these (now mostly ASCII) files in a normal text editor is way more easy than trying to figure out the original binary PDF.

yms

I would recommend taking a look at a few files using PDF Vole (a tool based on iText, and similar to RUPS).

PDF Vole and RUPS will both allow you to navigate through the structure of a PDF file, inspect the entries on every object, decompress compressed streams, decrypt the file when needed, look at the content of pages and annotations, and track down the relation between objects in the file.

For example this file:

Will look like this in PDF Vole:

You could also take a look on the class hierarchy of iText itself (which is almost 1-to-1 with the PDF spec) and the book that explains it, iText in Action.

If you are trying to generate PDF files via code, then this CodeProject source code might help.

The code along with the Adobe specification should get you going. I don't think there are many short cuts here. Understanding PostScript is going to take some study!

EDIT: and seeing as a PDF is compressed PostScript, something like RoPS could be handy too.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!