Indexing PDF with Solr

前端 未结 6 2021
一向
一向 2020-12-31 05:46

Can anyone point me to a tutorial.

My main experience with Solr is indexing CSV files. But I cannot find any simple instructions/tutorial to tell me what I need to d

6条回答
  •  粉色の甜心
    2020-12-31 05:56

    The hardest part of this is getting the metadata from the PDFs, using a tool like Aperture simplifies this. There must be tonnes of these tools

    Aperture is a Java framework for extracting and querying full-text content and metadata from PDF files

    Apeture grabbed the metadata from the PDFs and stored it in xml files.

    I parsed the xml files using lxml and posted them to solr

提交回复
热议问题