Converting from PDF to HTML [closed]

与世无争的帅哥 提交于 2019-12-31 10:31:15

问题


Is there a .dll I can use which uses a PDF file as an input and HTML file as an output? I want to convert from PDF to HTML. My colleague says that it's very difficult going step by step, getting text/font/image/margins/links etc. from PDF and then creating new HTML file with the same content. He says it's nearly impossible. So I was thinking - if there's some dll which I can use as a reference to do that?


回答1:


Writing a program to do it is definitely not trivial. If you don't find any .NET Library to do this (I couldn't, at least not free), I would just download this and invoke it programmatically to get my html.

If you have the time to spare and/or PDFToHtml does not produce acceptable output for you, you could use iText to write the program yourself. It's a very mature free pdf library. I've used it in the past to manipulate PDFs (merge, create, etc).

UPDATE

As noted in the comment by Quandary, the PDFSharp library offers a more relaxed license (MIT) compared to the Commercial or AGPL license offered by iText. Keep this is mind when choosing your library. I have not used the PDFSharp library myself and I don't know how they compare in terms of functionality.




回答2:


You can download this free tool: PDFToHTML

Then in your program just fork a new process and run the executable passing the PDF file. I just tested it now and it seems to work ok.




回答3:


If you don't mind paying, Aspose offers a very good solution, this is what we use at my company.

http://www.aspose.com/categories/.net-components/aspose.pdf-for-.net/key-features.aspx



来源:https://stackoverflow.com/questions/8123786/converting-from-pdf-to-html

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!