Manipulating PDF file

半城伤御伤魂 提交于 2020-01-06 06:26:48

问题


I would like to read a PDF file as a text (postscript), add new objects in the file structure and save the final output as a new PDF but If I just copied the PDF PostScript content and paste it in a newly created PDF file (where encoding='ansi'), the file doesn't work.

I am sure that this may be encoding issue but I am not sure what I should do to have a valid PDF file format after manipulating the original PostScript content.

Here is the piece of code that didn't work with me:

pdf_file = open('Input.pdf', 'r', encoding='ansi').read()
pdf_file_bytes = bytearray(pdf_file, 'ansi')
pdf_file = open('Output_bytes.pdf', 'wb').write(pdf_file_bytes)

And as I said, the output PDF is not valid!


回答1:


First problem; the content of a PDF file is PDF, not PostScript.

Secondly, PDF is a binary file foramt so if you copy and paste it any kind of translation (such as CR/LF) will break it.

You haven't said what programming language your code uses, though it looks like Python. If it is Python then reading the file as binary instead of text might help.




回答2:


A PDF file is a complex file format consisting of various objects, unless you under low-level syntax of the PDF specification carefully it will be difficult to impossible to arbitrarily replace some bytes with some other bytes and have it result in a still valid PDF file.

More to the point what are you trying to accomplish. E.g. there may be a high-level way of doing whatever you're trying to do that doesn't involve manipulating PDF syntax directly. E.g. if you need to modify a font, add an annotation, set the PDF version, etc. Otherwise if you actually need to modify PDF syntax you need to use a library capable of dealing with low-level objects.



来源:https://stackoverflow.com/questions/55261941/manipulating-pdf-file

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!