Extract hyperlinks from PDF in Python

前端 未结 5 1067
情深已故
情深已故 2020-12-30 09:33

I have a PDF document with a few hyperlinks in it, and I need to extract all the text from the pdf. I have used the PDFMiner library and code from http://www.endlesslycurio

5条回答
  •  悲&欢浪女
    2020-12-30 09:42

    The hyperlink will actually be an annotation, so you need to process the annotation rather than 'extract the text'. I suspect that you are going to need to use a library such as itextsharp, or MuPDF, or Ghostscript if you are really desperate (and comfortable programming in PostScript).

    I'd have thought it relatvely easy to process the annotations looking for type LNK though.

提交回复
热议问题