Use tika with python, runtimeerror: unable to start tika server

后端 未结 4 1043
礼貌的吻别
礼貌的吻别 2020-12-10 11:33

I am trying to use the tika package to Parse files. Tika is successfully installed, tika-server-1.18.jar runned with Code in cmd Java -jar tika-server-1.1

相关标签:
4条回答
  • 2020-12-10 11:44

    According to Apache Tika's site, all new versions of the tika-server.jar will require Java 8.

    24 April 2018: Apache Tika Release Apache Tika 1.18 has been released! This release includes bug fixes (e.g. extraction from grouped shapes in PPT), security fixes and upgrades to dependencies. PLEASE NOTE: The next versions will require Java 8. Please see the CHANGES.txt file for the full list of changes in the release and have a look at the download page for more information on how to obtain Apache Tika 1.18.

    Current outdated docs for tika Python library claim that Java 7 is needed, but now Java 8 must be installed. This is because the current version of tika-server.jar is automatically downloaded at runtime if not found in your temp file.

    After installing Java 8, my basic test code launched the server and worked without error.

    0 讨论(0)
  • 2020-12-10 11:58

    After you import Tika you need to initialize the Java Server

    import tika
    tika.initVM()
    from tika import parser
    parsed = parser.from_file('') //file name should be here
    
    0 讨论(0)
  • 2020-12-10 12:02

    You have not passed an argument (specified a file) in your line:

    parsed = parser.from_file('')

    Give it a file to chew on e.g.,

    parsed = parser.from_file('myfile.txt')
    

    The server didn't start & presumably this no log warning gets triggered - see line 644 in the source at the Github

    then another error message tells you it ain't going to play...

    0 讨论(0)
  • 2020-12-10 12:08

    Download Java. If you already have a version of Java installed, try updating it to the latest version. The version that works for me is 1.18.

    0 讨论(0)
提交回复
热议问题