How to make the Lucene QueryParser more forgiving?

て烟熏妆下的殇ゞ 提交于 2019-12-02 20:15:30

Yo can make Lucene ignore the special characters by sanitizing the query with something like

query = QueryParser.Escape(query)

If you do not want your users to ever use advanced syntax in their queries, you can do this always.

If you want your users to use advanced syntax but you also want to be more forgiving with the mistakes you should only sanitize after a ParseException has occured.

Well, the easiest thing to do would be to give the raw form of the query a shot, and if that fails, fall back to cleaning it up.

Query safe_query_parser(QueryParser qp, String raw_query)
  throws ParseException
{
  Query q;
  try {
    q = qp.parse(raw_query);
  } catch(ParseException e) {
    q = null;
  }
  if(q==null)
    {
      String cooked;
      // consider changing this "" to " "
      cooked = raw_query.replaceAll("[^\w\s]","");
      q = qp.parse(cooked);
    }
  return q;
}

This gives the raw form of the user's query a chance to run, but if parsing fails, we strip everything except letters, numbers, spaces and underscores; then we try again. We still risk throwing ParseException, but we've drastically reduced the odds.

You could also consider tokenizing the user's query yourself, turning each token into a term query, and glomming them together with a BooleanQuery. If you're not really expecting your users to take advantage of the features of the QueryParser, that would be the best bet. You'd be completely(?) robust, and users could search for whatever funny characters will make it through your analyzer

josefresno

FYI... Here is the code I am using for .NET

private Query GetSafeQuery(QueryParser qp, String query)
{
    Query q;
    try 
    {
        q = qp.Parse(query);
    } 

    catch(Lucene.Net.QueryParsers.ParseException e) 
    {
        q = null;
    }

    if(q==null)
    {
        string cooked;

        cooked = Regex.Replace(query, @"[^\w\.@-]", " ");
        q = qp.Parse(cooked);
    }

    return q;
}

I'm in the same situation as you.

Here's what I do. I do catch the exception, but only so that I can make the error look prettier. I don't change the text.

I also provide a link to an explanation of the Lucene syntax which I have simplified a little bit:
http://ifdefined.com/btnet/lucene_syntax.html

I do not know much about Lucene.net. For general Lucene, I highly recommend the book Lucene in Action. For the question at hand, it depends on your users. There are strong reasons, such as ease of use, security and performance, to limit your users' queries. The book shows ways to parse the queries using a custom parser instead of QueryParser. I second Jay's idea about the BooleanQuery, although you can build stronger queries using a custom parser.

If you don't need all Lucene features, you might go better by writing your own query parser. It's not as complicated as it might seem in the first place.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!