Improving performance of Location based search using Lucene

一个人想着一个人 提交于 2019-12-06 11:40:02

Basically, you have two types of search parameters: textual and spatial. You can probably use one type to filter the results you got from the other. For example, for someone looking for a .NET developer job near Atlanta, GA you could either first retrieve all the .NET developer jobs and filter for location, or retrieve all jobs around Atlanta and filter for .NET developer ones. I believe the first should be faster. You can also store the job locations directly in Lucene, and incorporate them in the search. A rough draft is: Indexing: 1. When you receive a new 'wanted' ad, find its geo-location using the database. 2. Store the location as a Lucene field in the ad's document. Retrieval: 1. Retrieve all jobs according to textual matches. 2. Use geometrical calculations for finding distances between the user's place and the job location. 3. Filter jobs according to distance.

Lucene in Action has an example of spatial search similar in spirit. A second edition is in the making. Also, check out Sujit Pal's suggestions for spatial search with Lucene and Patrick O'Leary's framework. There are also Locallucene and LocalSolr, but I do not know how mature they are.

my index size is about 4 MB.Am using the following code for building query for nearest cities:

foreach (string city in htNearestCities.Keys)
                {
                    cityStateQuery = new BooleanQuery();
                    queryCity = queryParserCity.Parse("\"" + city + "\"");
                    queryState = queryParserState.Parse("\"" + ((string[])htNearestCities[city])[1] + "\"");
                    cityStateQuery.Add(queryCity, BooleanClause.Occur.MUST); 
                    cityStateQuery.Add(queryState, BooleanClause.Occur.MUST);

                    findLocationQuery.Add(cityStateQuery, BooleanClause.Occur.SHOULD);
                    }

You may ultimately want to have lucene handle the spatial search by indexing tiles. But if you're certain the lucene query is slow, not the finding of the cities, then start by indexing the state and city together. Much like indexing multiple columns in a relational database: a 'state:city' field with values like 'GA:Atlanta'. Then the intersection isn't done at query time.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!