问题
What I want to do is pretty simple: given a PDF/PS/DjVu file containing a paper/book, find the authors and title of the paper (any other metadata would be good, but less needed). This recognition doesn't have to be perfect, but I'd like to make it as good as I can. I am looking for open-source .NET and/or Java libraries (preferably .NET) which allow to access metadata and contents of these files.
For PDF I've found PDFBox (.NET/Java) and PDF Library (.NET), but there may be better alternatives I am not aware of; for Postscript and DjVu, I haven't found anything.
回答1:
For most PDF manipulation we use iTextSharp. This is a port of the original Java implementation.
回答2:
Another PDF library is PDFSharp. It has pretty decent read/parse capabilities.
回答3:
For DjVu, you can use the commerical SDK from CamiNova or the open source library, DjVu Libre.
回答4:
For Djvu you can use the C# library located at: https://github.com/Telavian/DjvuNet
来源:https://stackoverflow.com/questions/1161465/libraries-for-parsing-pdf-postscript-and-or-djvu