The way I helped myself to learn PDF syntax was this:
Looked for a tool that could de-compress PDFs (de-compress the internal streams).
Found qpdf, Jay Birkenbilt's commandline tool described as: "does structural, content-preserving transformations on PDF files".
Routinely running qpdf --qdf input.pdf decompressed-input.pdf
.
Opening the newly created decompressed-input.pdf
in a text editor.
The --qdf
mode of the tool transforms the binary and ASCII elements of PDFs in a very useful way, without changing their visual page appearance (and it's very fast):
Decompress previously compressed objects (exposing f.e. the PDF language source code of page element drawing operations).
Also expand object streams (ObjStrm
).
Normalize the presentation of arrays, strings etc.
Re-number objects so they start from 1 0 obj
and then present them in ascending order in the file.
Repair b0rken xref
entries.
Add comments which contain an object's original identity in the original file.
Add comments for each page.
...and some more.
Looking at these (now mostly ASCII) files in a normal text editor is way more easy than trying to figure out the original binary PDF.