SOLR: Inconsistencies using splitBy to populate a multi valued field

烈酒焚心 提交于 2019-12-23 05:58:40

问题


I'm having trouble using the splitBy functionality to populate a multi valued field from a pipe delimited datasource. My implementation seems to partially work for one of the field and does not the work for the the other field. An example of my implementation below.

I have a db view with following data:

recordId relist dbaName

1 PA21|MD29 The Hong Kong Dragon|The Peeled Apple

My config:

<dataConfig>
    <dataSource name="jdbc" driver="oracle.jdbc.driver.OracleDriver" url="jdbc:oracle:thin:@myserver:1521:XE" user="MyUser" password="MyPassword"/>
    <document>
        <entity name="mentity" query="select * from MySampleView" transformer="RegexTransformer" >
            <field sourceColName="relist" column="relist" splitBy="\|"  />               
            <field sourceColName="dbaName" column="dbaName" splitBy="\|"  />
            <field column="recordId" name="recordId" />
        </entity>  
    </document>
</dataConfig>

My schema.xml snippet:

<fields>    
    <field name="relist"      type="string"    indexed="true"  stored="true"  multiValued="true" /> 
    <field name="dbaName"      type="string"    indexed="true"  stored="true"  multiValued="true" />
    <field name="recordId"        type="string"    indexed="true"  stored="true"  multiValued="false" />
</fields>

<uniqueKey>recordId</uniqueKey>

My expectation is that when the data is split and stored, that the document would look something like this:

"docs": [ 
  { 
    "relist": [ 
      "PA21", 
      "MD29" 
    ], 
    "recordId": "1", 
    "dbaName": [ 
      "The Hong Kong Dragon", 
      "The Peeled Apple" 
    ] 
  } 
] 

However, this is what I get:

"docs": [ 
  { 
    "relist": [ 
      "PA21", 
      "MD29", 
      "PA21|MD29" 
    ], 
    "recordId": "1", 
    "dbaName": [ 
      "The Hong Kong Dragon|The Peeled Apple" 
    ] 
  } 
] 

My issue:

  1. relist is being split but it also includes the original delimited value in the list
  2. dbaName is not being split

Any help would be appreciated.

Thanks


回答1:


I recently ran into this same issue, and discovered that if your field name has mixedCase, like dbaName, then splitby does not work without specifying the sourceColName. You can fix this by using a temporary name in your SQL query, like so:

    <entity name="mentity" query="select dbaName as dba_name from MySampleView" transformer="RegexTransformer" >
        <field sourceColName="dba_name" column="dbaName" splitBy="\|"  />
    </entity>  

The other option would be to call your database column dba_name and your field name dba_name. Then splitBy works just fine, without specifying sourceColName.

I found this quite confusing, so here is one more example, just to make it crystal clear to every one

This WILL NOT work:

select 'foo|bar' as genRes
<field column="genres" sourceColName="genRes" splitBy="\|" />

This WILL work

select 'foo|bar' as genres
<field column="genres" sourceColName="genres" splitBy="\|" />


来源:https://stackoverflow.com/questions/20253103/solr-inconsistencies-using-splitby-to-populate-a-multi-valued-field

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!