Semantic parsing with NLTK

问题

I am trying to use NLTK for semantic parsing of spoken navigation commands such as "go to San Francisco", "give me directions to 123 Main Street", etc.

This could be done with a fairly simple CFG grammar such as

S -> COMMAND LOCATION
COMMAND -> "go to" | "give me directions to" | ...
LOCATION -> CITY | STREET | ...

The problem is that this involves non-atomic (more than one word-long) literals such as "go to", which NLTK doesn't seem to be set up for (correct me if I am wrong). The parsing task has tagging as a prerequisite, and all taggers seem to always tag individual words. So, my options seem to be:

a) Define a custom tagger that can assign non-syntactic tags to word sequences rather than individual words (e.g., "go to" : "COMMAND"). b) Use features to augment the grammar, e.g., something like:

COMMAND -> VB[sem='go'] P[sem='to'] | ...

c) Use a chunker to extract sub-structures like COMMAND, then apply a parser to the result. Does NLTK allow chunker->parser cascading?

Some of these options seem convoluted (hacks). Is there a good way?

回答1:

It seems like you want to identify imperatives.

This answer has looked into that and contains a solution similar to your option (a), but a bit different since it lets the tagger do most of the work. (b) indeed seems a bit hacky... but you're creating a pretty custom application, so it could work! I would do (c) the other way around - parsing and then chunking based on the CFG in (a).

Overall, however, as the other answer explains, there doesn't seem to be a perfect way to do this just yet.

You might also want to look at pattern.en. Their