Search for text in PDF files [closed]

耗尽温柔 提交于 2019-12-13 04:28:46

问题


I have a list of words about (86 words), and some PDF files. I would like to search for those words into PDF files and return values ​​tell me if exist.

During research for solutions in tutorials I meet two problems:

  1. is that I'm forced to convert pdf file to file ??

  2. what is the simple bibilotheque that allows me to realize my problem, because I'm really stuck it there's a lot of examples (pdfbox, Appach Lucense, iText, pdftron ....)


回答1:


is what I'm forced to convert pdf file to file

PDF file is a file. So, you do not have to convert it. You have to be able to read it. You can use one of available java PDF parsers (e.g. pdfbox as you mentioned).

what is the simple bibilotheque that allows me to realize my problem...

As far as you have only 86 words and one document you probably do not need indexing tool like Lucene. However if you want to build application that supports different targets and different documents (especially if you need a real free text search) you probably need Lucene (or Solr) to perform indexing of your documents first and then performing a search using the index.



来源:https://stackoverflow.com/questions/16518694/search-for-text-in-pdf-files

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!