English grammar for parsing in NLTK

问题

Is there a ready-to-use English grammar that I can just load it and use in NLTK? I\'ve searched around examples of parsing with NLTK, but it seems like that I have to manually specify grammar before parsing a sentence.

Thanks a lot!

回答1:

You can take a look at pyStatParser, a simple python statistical parser that returns NLTK parse Trees. It comes with public treebanks and it generates the grammar model only the first time you instantiate a Parser object (in about 8 seconds). It uses a CKY algorithm and it parses average length sentences (like the one below) in under a second.

>>> from stat_parser import Parser
>>> parser = Parser()
>>> print parser.parse("How can the net amount of entropy of the universe be massively decreased?")
(SBARQ
  (WHADVP (WRB how))
  (SQ
    (MD can)
    (NP
      (NP (DT the) (JJ net) (NN amount))
      (PP
        (IN of)
        (NP
          (NP (NNS entropy))
          (PP (IN of) (NP (DT the) (NN universe))))))
    (VP (VB be) (ADJP (RB massively) (VBN decreased))))
  (. ?))

回答2:

My library, spaCy, provides a high performance dependency parser.

Installation:

pip install spacy
python -m spacy.en.download all

Usage:

from spacy.en import English
nlp = English()
doc = nlp(u'A whole document.\nNo preprocessing require.   Robust to arbitrary formating.')
for sent in doc:
    for token in sent:
        if token.is_alpha:
            print token.orth_, token.tag_, token.head.lemma_

Choi et al. (2015) found spaCy to be the fastest dependency parser available. It processes over 13,000 sentences a second, on a single thread. On the standard WSJ evaluation it scores 92.7%, over 1% more accurate than any of CoreNLP's models.

回答3:

There is a Library called Pattern. It is quite fast and easy to use.

>>> from pattern.en import parse
>>>  
>>> s = 'The mobile web is more important than mobile apps.'
>>> s = parse(s, relations=True, lemmata=True)
>>> print s

'The/DT/B-NP/O/NP-SBJ-1/the mobile/JJ/I-NP/O/NP-SBJ-1/mobile' ...

回答4:

There are a few grammars in the nltk_data distribution. In your Python interpreter, issue nltk.download().

回答5:

Use the MaltParser, there you have a pretrained english-grammar, and also some other pretrained languages. And the Maltparser is a dependency parser and not some simple bottom-up, or top-down Parser.

Just download the MaltParser from http://www.maltparser.org/index.html and use the NLTK like this:

import nltk
parser = nltk.parse.malt.MaltParser()

回答6:

I've tried NLTK, PyStatParser, Pattern. IMHO Pattern is best English parser introduced in above article. Because it supports pip install and There is a fancy document on the website (http://www.clips.ua.ac.be/pages/pattern-en). I couldn't find reasonable document for NLTK (And it gave me inaccurate result for me by its default. And I couldn't find how to tune it). pyStatParser is much slower than described above in my Environment. (About one minute for initialization and It took couple of seconds to parse long sentences. Maybe I didn't use it correctly).

回答7:

Did you try POS tagging in NLTK?

text = word_tokenize("And now for something completely different")
nltk.pos_tag(text)

The answer is something like this

[('And', 'CC'), ('now', 'RB'), ('for', 'IN'), ('something', 'NN'),('completely', 'RB'), ('different', 'JJ')]

Got this example from here NLTK_chapter03

来源：https://stackoverflow.com/questions/6115677/english-grammar-for-parsing-in-nltk

标签

python

nlp

grammar

nltk