I\'m trying to parse a few PDF files that contain engineering drawings to obtain text data in the files. I tried using TIKA as a jar with python and using it with the jnius
You need to download the Tika Server Jar and run it first. Check this link: http://wiki.apache.org/tika/TikaJAXRS
java -jar tika-server-x.x.jar --port xxxx
tika.initVM()
Add tika.TikaClientOnly = True
instead of tika.initVM()
parsed = parser.from_file('/path/to/file')
to
parsed = parser.from_file('/path/to/file', '/path/to/server')
You will get the server path in Step 2. when the tika server initiates - just plug that in hereGood luck!