Extracting 'useful' information out of sentences?

后端 未结 2 1446
离开以前
离开以前 2020-12-30 12:01

I am currently trying to understand sentences of this form:

The problem was more with the set-top box than the television. Restarting the set-top box solved t

2条回答
  •  心在旅途
    2020-12-30 12:43

    Probably, if the sentences are well-formed, I would experiment with dependency parsing (http://nltk.googlecode.com/svn/trunk/doc/api/nltk.parse.malt.MaltParser-class.html#raw_parse). That gives you a graph of the constituents of a sentence and you can tell the relations between the lexical items. Later, you can extract phrases from the output of a dependency parser (http://nltk.googlecode.com/svn/trunk/doc/book/ch08.html#code-cfg2) That could help you to extract the direct object of a sentence, or the verb phrase in a sentence.

    If you just want to get phrases or "chunks" from a sentence, you can try chunk parser (http://nltk.googlecode.com/svn/trunk/doc/api/nltk.chunk-module.html). You can also carry out named entity recognition (http://streamhacker.com/2009/02/23/chunk-extraction-with-nltk/). It's usually used to extract instances of places, organizations or people names but it could work in your case as well.

    Assuming that you solve the problem of extracting noun/verb phrases from a sentence, you may need to filter them out to ease the job of your domain expert (too many phrases could overwhelm a judge). You may carry out a frequency analysis on your phrases, remove very frequent ones that are not usually related to the problem domain, or compile a white-list and keep the phrases that contain a pre-defined set of words, etc.

提交回复
热议问题