Custom Solr analyzers not being used during indexing

本小妞迷上赌 提交于 2020-01-25 20:50:12

问题


I have a bunch of PDF files on my machine which I want to index in Solr. For this purpose, I have created a schema file with custom field types and user-defined fields.

Given below are the fields and copyFields in my schema.xml:

<field name="id" type="custom01" indexed="true" stored="true" required="true" multiValued="false" />
<field name="_version_" type="long" indexed="true" stored="false"/>
<field name="_root_" type="string" indexed="true" stored="false" docValues="false" />
<field name="_text_" type="custom02" indexed="true" stored="true" multiValued="true"/>
<field name="fileEx" type="custom03" indexed="false" stored="true" multiValued="false"/>

<copyField source="id" dest="fileEx"/>

The id field will contain the actual path of the indexed file. I plan to copy this value into fileEx and save just the extension of the file in the field using the custom analyzer as given in the field definition.

The following are my custom fieldType definitions:

<fieldType name="custom01" class="solr.TextField"> <!-- Dummy fieldType -->
<analyzer>
<tokenizer class="solr.PatternTokenizerFactory" pattern="^$"/>
</analyzer>
</fieldType>

<fieldType name="custom02" class="solr.TextField">
<analyzer>
<tokenizer class="solr.PatternTokenizerFactory" pattern="\.([^.]*$)" group="0"/>
<filter class="solr.PatternReplaceFilterFactory" pattern="\." replacement=""/>
</analyzer>
</fieldType>

When I tried to index the files using this schema, the contents of the id field were just copied into fileEx without any analyzing done. Both id and fileEx had the same value. I used the analyzer tab in the SolrUI to see if my fieldTypes actually work and found that they work as expected.

But for some reason, the analyzers don't seem to be running properly while indexing actual documents.

So, at this point I am stuck and frustrated. Any help regarding this will be much appreciated. TIA.


回答1:


Do I understand correctly that you're asking why the text returned from a hit hasn't changed? The text returned is the value before processing, not the tokenized contents of the field. You will not see any change in the value returned by changing the analyzer. This is required to make things like highlighting work properly.

If you want to change the text before it arrives in a field, use an update processor.



来源:https://stackoverflow.com/questions/39301371/custom-solr-analyzers-not-being-used-during-indexing

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!