问题
I have a hyphenated word. In my case it is "re-use". I want to be able to match it for "re-use", "reuse" and "re use".
If I use a WordDelimiterFilterFactory
with catenateAll=1
then it will transform "re-use" into "reuse". This doesn't cover the case of a search for "re use".
In addition to this, the word 're-use' is being used as as a synonym with SynonymFilterFactory
so the solution would have to work with that too.
If my synonym file says "re-use => other thing", then I want to be able to match "other thing" when I type "re-use" or "reuse" or "re use". I have tried actually creating a synonym entry like "re use => re-use". This results in matching documents containing the non-hyphenated version, but doesn't then match "other thing" (I don't mind being extra-permissive about matching "re" or "use").
I could add a synonym for this word, but I'd like a general solution. Is there something obvious that I've missed?
EDIT:
I have 4 documents:
- "thing"
- "re use"
- "re-use"
- "reuse"
I want to search for any of these terms and return all the documents. The relevant bit of my schema:
<fieldType name="text_general" class="solr.TextField" positionIncrementGap="100">
<analyzer type="index">
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.WordDelimiterFilterFactory" catenateAll="1" preserveOriginal="1" />
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
<filter class="solr.PorterStemFilterFactory"/>
</analyzer>
<analyzer type="query">
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.WordDelimiterFilterFactory" catenateAll="1" preserveOriginal="1" />
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.PorterStemFilterFactory"/>
</analyzer>
</fieldType>
If my synonyms file looks like this, everything works as expected.
re use, reuse, thing
However, I want to represent that "re use" and "reuse" are synonyms. I also want to say that "reuse" and "thing", and lots of other things are synonyms. So I tried this:
re use, reuse
reuse, thing
This doesn't work. I think that lexk's answer suggested that it would?
回答1:
it's enough to define synonym rule for re-use, if you are doing indexing time expansion. Say, you have re-use. Then you transform it to reuse. Then you apply SynonymFilter so that you get re-use,reuse,'other thing' at the same index position. When you search for 'other thing', you get the match regardless of how many variations of re-use you created.
来源:https://stackoverflow.com/questions/17952126/working-with-hyphenated-words-in-solr