Detect and alter strings in PDFs

后端 未结 2 453
甜味超标
甜味超标 2020-12-11 04:34

I want to be able to detect a pattern in a PDF and somehow flag it.

For instance, in this PDF, there\'s the string *2. I want to be able to parse the PD

2条回答
  •  既然无缘
    2020-12-11 04:43

    This is non-trivial. The problem is that PDF files are not meant to be "updated" on anything less than a page. You basically have to parse the page, adjust the PostScript rendering, and then write it back out. I don't think PyPDF has the support for doing what you want.

    If "all" you want to do is to add highlighting you can probably just use the annotation dictionary. See the PDF specification for more information.

    You might be able to do this using pyPDF2 but I haven't looked into it closely.

提交回复
热议问题