I have a website where users upload documents in .doc and .pdf format. I am using Sphinx to conduct full text searches on my SQL database (MySQL). What is the best way to
The method I use for this is pdf2text and antiword. I use both of these to dump the contents of the pdfs and word documents into the database. From there it's easy to crawl with Sphinx.