Solr associations

此生再无相见时 提交于 2019-12-03 07:41:29

Basically you have a design decision here. The usual thing people do with Solr indexes is to denormalize them, i.e. explode the category definition into the business' document. As you do not want to do this, I suggest keeping two types of documents - one for the businesses and another for the categories.You can keep both in the same index, as Solr does not require all documents to have the same fields. The business documents seem straightforward, but you have to make them searchable by both the business name and the category id. I suggest creating a category document for each synonym, where you search by synonym and find the id (and category name).

To search using synonyms, you will need a double search -

  • Search for category id using the name's text.
  • Search for businesses using the category id.
CraftyFella

There is actually a filter class called solr.SynonymFilterFactory.

This should allow you to map the cat numbers to its 2 text equivalents, if you use it in the query analyser only, something like the following:

    <fieldType name="category" class="solr.TextField" positionIncrementGap="100">
  <analyzer type="index">
    <tokenizer class="solr.WhitespaceTokenizerFactory"/>
    <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="1" catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/>
    <filter class="solr.LowerCaseFilterFactory"/>
  </analyzer>
  <analyzer type="query">
    <tokenizer class="solr.WhitespaceTokenizerFactory"/>
    <filter class="solr.SynonymFilterFactory" synonyms="category_Synonyms.txt" ignoreCase="true" expand="false"/>
    <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="0" catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"/>
    <filter class="solr.LowerCaseFilterFactory"/>
  </analyzer>
</fieldType>

That way you can index ONLY the category ID. This means you won't have to send all the businesses to Solr again. Also if someone queries "software"or "IT" it will map it to the category

Your category_Synonyms.txt should have lines such as the following:

1, software, IT

The onlydraw back here is that you'll have to come up with a way of editing the text document when you change the names or synonyms. So i guess this will only help if you change the category names infrequently?? Unless someone else knows of a way that this can be done easily.

I actually added the above to my own solr and ran the Analyser tool on it.. here is the result:

As you can see it's turned software into

1

Please note you MUST set the

expand

parameter to

false

I hope this helps.

Dave

You cannot find the unindexed pieces of informations, unless you implement some kind of query translation/expansion that translates some query terms in their indexed equivalent before submitting the query.

So, if the user types "restaurant", then your query is translated to include a filter by cat=1.

As far as I know Solr doesn't include this feature, so you have to implement it on your own or adapt a suitable module (like http://lucene-qe.sourceforge.net/).

Other than some of the excellent ideas offered earlier, you can also look at a multivalued fields. So your category field can contain any number of values (and updated when needed), when you search it queries all the values.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!