Parser to parse search terms and extract valuable information [closed]

╄→гoц情女王★ 提交于 2019-12-06 15:59:36

问题


I would like to understand the serarh term of a user. Think of someone is searching for "staples in NY" - I would like to understand that its a location search where keyword is staples and location is new york. Similarly if someone types "cat in hat", the parser should not flag that also as a location search, here the entire keyword is "cat in hat". Is there any algorithm or open source library available to parse a search term and understand its a comparison (like A vs B) or its a location based search (like A in X)?


回答1:


The problem you describe is called information extraction. A host of algorithms exist, the simplest being regexp matching, the best structured machine learning. Try regexps first and look at something like NLTK if you know Python.

Distinguishing "staples in NY" from "cat in hat" is possible if your program knows that "NY" is a location. You can tell either by the capitals or because "NY" occurs in a list called a gazetteer.

The problem in general is AI-complete, so expect to put in lots of hard work if you want good results.




回答2:


You should write such linguistic rules in grammars such as GATE and http://code.google.com/p/graph-expression/. Examples: Token+ in (LocationLookup).




回答3:


Not too sure but two approaches as per my experience with parsing -

  1. Define a grammar which can parse the expression and collect values / parameters. You might want to come up with a dictionary of keywords using which you can then deduce the the type of search.

  2. Be strict when defining your grammar so that the expression itself tells you about the type of search. eg LOC: A in B , VALUE $ to Euro. etc.

For parser see ANTLR / jcup & jflex.



来源:https://stackoverflow.com/questions/6416595/parser-to-parse-search-terms-and-extract-valuable-information

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!