Libraries for parsing PDF, PostScript, and/or DjVu

问题

What I want to do is pretty simple: given a PDF/PS/DjVu file containing a paper/book, find the authors and title of the paper (any other metadata would be good, but less needed). This recognition doesn't have to be perfect, but I'd like to make it as good as I can. I am looking for open-source .NET and/or Java libraries (preferably .NET) which allow to access metadata and contents of these files.

For PDF I've found PDFBox (.NET/Java) and PDF Library (.NET), but there may be better alternatives I am not aware of; for Postscript and DjVu, I haven't found anything.

回答1:

For most PDF manipulation we use iTextSharp. This is a port of the original Java implementation.

回答2:

Another PDF library is PDFSharp. It has pretty decent read/parse capabilities.

回答3:

For DjVu, you can use the commerical SDK from CamiNova or the open source library, DjVu Libre.

回答4:

For Djvu you can use the C# library located at: https://github.com/Telavian/DjvuNet

来源：https://stackoverflow.com/questions/1161465/libraries-for-parsing-pdf-postscript-and-or-djvu

标签

pdf

postscript

djvu

易学教程内所有资源均来自网络或用户发布的内容，如有违反法律规定的内容欢迎反馈！
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!