Indexing PDF with Solr

前端 未结 6 2030
一向
一向 2020-12-31 05:46

Can anyone point me to a tutorial.

My main experience with Solr is indexing CSV files. But I cannot find any simple instructions/tutorial to tell me what I need to d

6条回答
  •  情歌与酒
    2020-12-31 05:46

    You could use the dataImportHandler. The DataImortHandle will be defined at the solrconfig.xml, the configuration of the DataImportHandler should be realized in an different XML config file (data-config.xml)

    For indexing pdf's you could

    1.) crawl the directory to find all the pdf's using the FileListEntityProcessor

    2.) reading the pdf's from an "content/index"-XML File, using the XPathEntityProcessor

    If you have the list of related pdf's, use the TikaEntityProcessor look at this http://solr.pl/en/2011/04/04/indexing-files-like-doc-pdf-solr-and-tika-integration/ (example with ppt) and this Solr : data import handler and solr cell

提交回复
热议问题