NLTK was unable to find the gs file

谁说胖子不能爱 提交于 2020-01-22 10:38:25

问题


I'm trying to use NLTK, the stanford natural language toolkit. After install the required files, I start to execute the demo code: http://www.nltk.org/index.html

>>> import nltk

>>> sentence = """At eight o'clock on Thursday morning
... Arthur didn't feel very good."""

>>> tokens = nltk.word_tokenize(sentence)

>>> tokens

['At', 'eight', "o'clock", 'on', 'Thursday', 'morning',

'Arthur', 'did', "n't", 'feel', 'very', 'good', '.']

>>> tagged = nltk.pos_tag(tokens)

>>> tagged[0:6]

[('At', 'IN'), ('eight', 'CD'), ("o'clock", 'JJ'), ('on', 'IN'),

('Thursday', 'NNP'), ('morning', 'NN')]

>>> entities = nltk.chunk.ne_chunk(tagged)

>>> entities

Then I get message:

LookupError: 

===========================================================================
NLTK was unable to find the gs file!
Use software specific configuration paramaters or set the PATH environment variable.

I tried google, but there's no one tell what the missing gs file is.


回答1:


I came across this error too.

gs stands for ghostscript. You get the error because your chunker is trying to use ghostscript to draw a parse tree of the sentence, something like this:

I was using IPython; to debug the issue I set the traceback verbosity to verbose with the command %xmode verbose, which prints the local variables of each stack frame. (see the full traceback below) The file names are:

file_names=['gs', 'gswin32c.exe', 'gswin64c.exe']

A little Google search for gswin32c.exe told me it was ghostscript.

/Users/jasonwirth/anaconda/lib/python3.4/site-packages/nltk/__init__.py in find_file_iter(filename='gs', env_vars=['PATH'], searchpath=(), file_names=['gs', 'gswin32c.exe', 'gswin64c.exe'], url=None, verbose=False)
    517                         (filename, url))
    518         div = '='*75
--> 519         raise LookupError('\n\n%s\n%s\n%s' % (div, msg, div))
    520 
    521 def find_file(filename, env_vars=(), searchpath=(),

LookupError: 

===========================================================================
NLTK was unable to find the gs file!
Use software specific configuration paramaters or set the PATH environment variable.
===========================================================================



回答2:


A bit addition to Jason Wirth's answer. Under Windows, this line of code will search for "gswin64c.exe" in the environment variable PATH, however, the ghostscript installer does not add the binary to PATH, so for this to work, you'll need to find where ghostscript is installed and add the /bin subfolder to PATH.

For example, in my case I added C:\Program Files\gs\gs9.19\bin to PATH.




回答3:


Just to add to the previous answers, if you replace 'entities' with 'print(entities)' you won't get the error.

Without print() the console/notebook doesn't know how to "draw" a tree object.




回答4:


In addition to Alex Kinman, I also still get the same error, even after installing ghostscript and adding it to the nltk path. Using print() enables the entities to be printed, and even with this error I seem to be able to get the output below, but unfortunately no tree yet.

Tree('S', [('At', 'IN'), ('eight', 'CD'), ("o'clock", 'NN'), ('on', 'IN'), ('Thursday', 'NNP'), ('morning', 'NN'), Tree('PERSON', [('Arthur', 'NNP')]), ('did', 'VBD'), ("n't", 'RB'), ('feel', 'VB'), ('very', 'RB'), ('good', 'JJ'), ('.', '.')]) 



回答5:


If ghostscript for some reason is not available for your platform or fails to install you can also use the wonderful networkx package to visualize such trees:

import networkx as nx
from networkx.drawing.nx_agraph import graphviz_layout
import matplotlib.pyplot as plt

def drawNodes(G,nodeLabels,parent,lvl=0):
    def addNode(G,nodeLabels,label):
        n = G.number_of_nodes()
        G.add_node(n)
        nodeLabels[n] = label
        return n
    def findNode(nodeLabels,label):
        # Travel backwards from end to find right parent
        for i in reversed(range(len(nodeLabels))):
            if nodeLabels[i] == label:
                return i

    indent = " "*lvl
    if lvl == 0:
        addNode(G,nodeLabels,parent.label())
    for node in parent:
        if type(node) == nltk.Tree:
            n = addNode(G,nodeLabels,node.label())
            G.add_edge(findNode(nodeLabels,parent.label()),n)
            drawNodes(G,nodeLabels,node,lvl+1)
        else:
            print node
            n1 = addNode(G,nodeLabels,node[1])
            n0 = addNode(G,nodeLabels,node[0])
            G.add_edge(findNode(nodeLabels,parent.label()),n1)
            G.add_edge(n0,n1)

G = nx.Graph()
nodeLabels = {}
drawNodes(G,nodeLabels,entities)
options = {
    'node_color': 'white',
    'node_size': 100
 }
plt.figure(1,figsize=(12,6))
pos=graphviz_layout(G, prog='dot')
nx.draw(G, pos, font_weight='bold', arrows=False, **options)
l = nx.draw_networkx_labels(G,pos,nodeLabels) 



来源:https://stackoverflow.com/questions/36942270/nltk-was-unable-to-find-the-gs-file

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!