Searching for a string in a pdf files

后端 未结 3 1994
青春惊慌失措
青春惊慌失措 2020-12-20 23:56

I am working on a school project that has several pdf files. There should be a search by name functionality that I just type in the student\'s name and all the pdf files wit

3条回答
  •  不思量自难忘°
    2020-12-21 00:34

    I think your task may be split as follows:

    • Build index of PDF files
    • Write some code that will use the index to locate relevant PDF whenever a search performed
    • Write some code that will open found PDF or show a warning if nothing was found

    To build index you may use some integrated solution like Apache Lucene or Lucene.Net or convert each PDF into text and build index from the text yourselves.

    Other two steps are fairly trivial and depend on language/technology used in first step.

    Your question is tagged as related to .NET, so you may try Docotic.Pdf library for index building (disclaimer: I work for Bit Miracle).

    Docotic.Pdf may be used to extract text from PDF files as plain text or as collection of text chunks with coordinates for each chunk.

提交回复
热议问题