Indexing Word Documents and PDFs with Sphinx

前端 未结 3 1650
醉话见心
醉话见心 2020-12-14 11:43

I have a website where users upload documents in .doc and .pdf format. I am using Sphinx to conduct full text searches on my SQL database (MySQL). What is the best way to

3条回答
  •  醉话见心
    2020-12-14 12:38

    Has anyone used Tika to index other types of documents, much like the SOLR plugin? Apache Tika

    Some links:

    1. PDF2TEXT is in poppler or poppler-utils on Linux
    2. ANTIWORD -- seems to be for old .doc, not newer .docx

提交回复
热议问题