SOLR: how to copy data to another field with filtered values?

丶灬走出姿态 提交于 2020-04-17 21:53:34

问题


I have Price field in solr with below types of values.

"Price":"0.07 AUD"
"Price":"10.00"
"Price":"AUD"

So, I need another custom field CustomPrice

To create this I used copy field to copy data from Price to CustomPrice

But, I need only number values into CustomPrice like below

"CustomPrice":"0.07"
"CustomPrice":"10.00"
"CustomPrice":"0"

Also need CustomPrice field type as pfloat so we can sort the field by number.

I tried CopyField, PatternTokenizerFactory, PatternReplaceFilterFactory to do this.

My old question reference : SOLR: How to sort by price when price not added properly?

So, How can I create a new float field in which I can copy only the numerical values from Price field?

Below is the error when I set new field type as float

"error":{
    "metadata":[
      "error-class","org.apache.solr.common.SolrException",
      "root-error-class","java.lang.NumberFormatException"],
    "msg":"ERROR: [doc=12958142955618] Error adding field 'Price'='129.95 AUD' msg=For input string: \"129.95 AUD\"",
    "code":400}}

回答1:


This can be achieved with the help of Update Request Processors. Every update request received by Solr is run through a chain of plugins known as Update Request Processors.

This can be useful, for example, to add a field to the document being indexed; to change the value of a particular field; or to drop an update if the incoming document doesn’t fulfill certain criteria.

You can add a processor in order to achieve the same.

The processor is :

<processor class="solr.RegexReplaceProcessorFactory">
            <str name="fieldName">price</str>
            <str name="pattern">[^0-9.]+</str>
            <str name="replacement"></str>
            <bool name="literalReplacement">true</bool>
       </processor>

The processor can be added to the updateRequestProcessorChain as below

<updateRequestProcessorChain name="add-unknown-fields-to-the-schema" default="${update.autoCreateFields:true}"
           processor="uuid,remove-blank,field-name-mutating,parse-boolean,parse-long,parse-double,parse-date,add-schema-fields">
   <processor class="solr.RegexReplaceProcessorFactory">
        <str name="fieldName">price</str>
        <str name="pattern">[^0-9.]+</str>
        <str name="replacement"></str>
        <bool name="literalReplacement">true</bool>
   </processor>
    <processor class="solr.LogUpdateProcessorFactory"/>
    <processor class="solr.DistributedUpdateProcessorFactory"/>
    <processor class="solr.RunUpdateProcessorFactory"/>

  </updateRequestProcessorChain>

Add this entry to you managed schema file.

<field name="copyFloatPrice" type="float" indexed="true" stored="true" multiValued="false" docValues="true"/>
<copyField source="price" dest="copyFloatPrice"/>

When I query the solr I get the below data. I could achieve the sorting on the copyFloatPrice.


Here is the changes done.

The change in the solrConfig.xml.

<updateRequestProcessorChain name="add-unknown-fields-to-the-schema" default="${update.autoCreateFields:true}"
           processor="uuid,remove-blank,field-name-mutating,parse-boolean,parse-long,parse-double,parse-date,add-schema-fields">
   <processor class="solr.CloneFieldUpdateProcessorFactory"> 
        <str name="source">price</str>
        <str name="dest">copyFloatPrice</str> 
   </processor>
   <processor class="solr.RegexReplaceProcessorFactory">
        <str name="fieldName">copyFloatPrice</str>
        <str name="pattern">[^0-9.]+</str>
        <str name="replacement"></str>
        <bool name="literalReplacement">true</bool>
   </processor>
   <processor class="solr.RegexReplaceProcessorFactory">
        <str name="fieldName">copyFloatPrice</str>
        <str name="pattern">^$</str>
        <str name="replacement">0</str>
        <bool name="literalReplacement">true</bool>
   </processor>
    <processor class="solr.LogUpdateProcessorFactory"/>
    <processor class="solr.DistributedUpdateProcessorFactory"/>
    <processor class="solr.RunUpdateProcessorFactory"/>
</updateRequestProcessorChain>

The changes made in the managed-schema file are :

<field name="price" type="string" indexed="true" stored="true" multiValued="false"/>
    <field name="copyFloatPrice" type="float" indexed="true" stored="true" multiValued="false" docValues="true"/>

The solr query response which shows the data indexed for price, copyFloatPrice and sorting achieved.



来源:https://stackoverflow.com/questions/60375355/solr-how-to-copy-data-to-another-field-with-filtered-values

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!