Extract hyperlinks from PDF in Python

前端未结

关注

 5  1081

情深已故 2020-12-30 09:33

I have a PDF document with a few hyperlinks in it, and I need to extract all the text from the pdf. I have used the PDFMiner library and code from http://www.endlesslycurio

5条回答

悲&欢浪女 (楼主)

2020-12-30 09:42

The hyperlink will actually be an annotation, so you need to process the annotation rather than 'extract the text'. I suspect that you are going to need to use a library such as itextsharp, or MuPDF, or Ghostscript if you are really desperate (and comfortable programming in PostScript).

I'd have thought it relatvely easy to process the annotations looking for type LNK though.

0 讨论(0)

查看其它5个回答
发布评论:

提交评论
- 加载中...