发表新帖

发表新帖

Indexing PDF with Solr

前端未结

关注

 6  2021

一向 2020-12-31 05:46

Can anyone point me to a tutorial.

My main experience with Solr is indexing CSV files. But I cannot find any simple instructions/tutorial to tell me what I need to d

6条回答

粉色の甜心 (楼主)

2020-12-31 05:56

The hardest part of this is getting the metadata from the PDFs, using a tool like Aperture simplifies this. There must be tonnes of these tools

Aperture is a Java framework for extracting and querying full-text content and metadata from PDF files

Apeture grabbed the metadata from the PDFs and stored it in xml files.

I parsed the xml files using lxml and posted them to solr

0 讨论(0)

查看其它6个回答
发布评论:

提交评论
- 加载中...

热议问题