python send file to tika running as a service

梦想的初衷 提交于 2019-12-06 12:38:52

问题


Reference to this question I would like to send a MS Word (.doc) file to a tika application running as a service, how can I do this?

There is this link for running tika: http://mimi.kaktusteam.de/blog-posts/2013/02/running-apache-tika-in-server-mode/

But for the python code to access it I am not sure if I can use sockets or urllib or what exactly?


回答1:


For remote access to Tika, there are basically two methods available. One is the Tika JAXRS Server, which provides a full RESTful interface. The other is the simple Tika-App --server mode, which just works at a network pipe level.

For production use, you'll probably want to use the Tika JAXRS server, as it's more fully featured. For simple testing and getting started, the Tika App in Server mode ought to be fine

For the latter, just connect to the port that you're running the Tika-App on, stream it your document data, and read your html back. For example, in one terminal run

$ java -jar tika-app-1.3.jar --server --port 1234

Then, in another, do

$ nc 127.0.0.1 1234 < test.pdf

You'll then see the html returned of your test PDF

From python, you just want a simple socket call much as netcat there is doing, send over the binary data, then read back your result. For example, try something like:

#!/usr/bin/python
import socket, sys

# Where to connect
host = '127.0.0.1'
port = 1234

if len(sys.argv) < 2:
  print "Must give filename"
  sys.exit(1)

filename = sys.argv[1]
print "Sending %s to Tika on port %d" % (filename, port)

# Connect to Tika
s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s.connect((host,port))

# Open the file to send
f = open(filename, 'rb')

# Stream the file to Tika
while True:
  chunk = f.read(65536)
  if not chunk:
    # EOF
    break
  s.sendall(chunk)

# Tell Tika we have sent everything
s.shutdown(socket.SHUT_WR)

# Get the response
while True:
  chunk = s.recv(65536)
  if not chunk:
    # EOF
    break
  print chunk


来源:https://stackoverflow.com/questions/19361254/python-send-file-to-tika-running-as-a-service

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!