ContentExtraction of PDF file in solr using Apache Tika
I am trying to index the PDF file in the solr using the following tutorial http://wiki.apache.org/solr/ExtractingRequestHandler But everytime i am firing the command java -jar post.jar *.pdf it says some org.apache.solr.common.SolrException: Invalid UTF-8 middle byte 0xe3 Error Kindly help me in indexing the PDF to solr server.Is there any other integration then tika which can help me. Post.jar is just an utility to upload files to Solr. Solr uses Extract handler so you need to provide as url. e.g. java -Durl=http://localhost:8983/solr/update/extract?literal.id=1 -Dtype=application/pdf -jar