How can I convert PDF to HTML?

前端 未结 9 589
慢半拍i
慢半拍i 2020-12-12 20:09

What good libraries are there, in any common language, for converting PDF to HTML?

相关标签:
9条回答
  • 2020-12-12 20:54

    http://www.lowagie.com/iText/ Opensource library for both Java and C#

    0 讨论(0)
  • 2020-12-12 20:55

    If you are working on a Windows box, I think Amyuni has a library for this as well. Their PDF Document Convertor is accessible as a DLL, can be used widely among the languages supported by Visual Studio, and can convert to RTF, TML, EXCEL, JPEG, and TIFF.

    0 讨论(0)
  • In Perl, you can use the SWISH::Filter plugin SWISH::Filters::Pdf2HTML. (It requires the xpdf package.)

    For the reverse (HTML to PDF), see this question.

    0 讨论(0)
  • 2020-12-12 20:58

    You can use a module in Python called PDFMiner.

    You can install it like this:

    pip install pdfminer
    

    Use this module as below:

    pdf2txt.py -o output.html -t html file.pdf
    

    Link to the module: https://pypi.org/project/pdfminer/

    0 讨论(0)
  • 2020-12-12 21:00

    PDFBox at apache has an html extraction capability. http://pdfbox.apache.org/

    0 讨论(0)
  • 2020-12-12 21:04

    The pdftohtml program converts pdf to html and xml and preserves position information of the text which is helpful for scraping tables..

    It seems to be based on the xpdf library and has a windows binary, too.

    0 讨论(0)
提交回复
热议问题