Parser to parse search terms and extract valuable information [closed]

问题

I would like to understand the serarh term of a user. Think of someone is searching for "staples in NY" - I would like to understand that its a location search where keyword is staples and location is new york. Similarly if someone types "cat in hat", the parser should not flag that also as a location search, here the entire keyword is "cat in hat". Is there any algorithm or open source library available to parse a search term and understand its a comparison (like A vs B) or its a location based search (like A in X)?

回答1:

The problem you describe is called information extraction. A host of algorithms exist, the simplest being regexp matching, the best structured machine learning. Try regexps first and look at something like NLTK if you know Python.

Distinguishing "staples in NY" from "cat in hat" is possible if your program knows that "NY" is a location. You can tell either by the capitals or because "NY" occurs in a list called a gazetteer.

The problem in general is AI-complete, so expect to put in lots of hard work if you want good results.

回答2:

You should write such linguistic rules in grammars such as GATE and http://code.google.com/p/graph-expression/. Examples: Token+ in (LocationLookup).

回答3:

Not too sure but two approaches as per my experience with parsing -

Define a grammar which can parse the expression and collect values / parameters. You might want to come up with a dictionary of keywords using which you can then deduce the the type of search.
Be strict when defining your grammar so that the expression itself tells you about the type of search. eg LOC: A in B , VALUE $ to Euro. etc.

For parser see ANTLR / jcup & jflex.

来源：https://stackoverflow.com/questions/6416595/parser-to-parse-search-terms-and-extract-valuable-information

标签

algorithm

parsing

nlp

information-extraction