Solr DIH regextransformer - processes only one CSV line

主宰稳场 提交于 2019-12-06 14:17:48

问题


Hi I have the following CSV file

132 1536130302256087040
133 1536130302256087041
134 1536130302256087042

the fields are seperated by a tab. Now I have the Dataimporthandler (DIH) for the solr, and I try to import the CSV into solr, but I only get the first line into solr. Thats the result, but the other lines from the CSV are missing:

  "response": {
    "numFound": 1,
    "start": 0,
    "maxScore": 1,
    "docs": [ {
        "string": "1536130302256087040",
        "id": "132",
        "_version_": 1536202153221161000
      } ] }

Here is my data-config.xml

<dataConfig>
<dataSource type="FileDataSource" encoding="UTF-8" name="fds"/>
    <document>

     <entity name="f" 
     processor="FileListEntityProcessor" 
     fileName="myfile.csv" 
     baseDir="/var/www/solr-5.4.0/server/csv/files" 
     recursive="false" 
     rootEntity="true" 
     dataSource="null" >

     <entity 
     onError="continue" 
     name="jc"   
     processor="LineEntityProcessor" 
     url="${f.fileAbsolutePath}" 
     dataSource="fds"  
     rootEntity="true" 
     header="false"
     separator="\t"
     transformer="RegexTransformer" >

     <field column="id" name="id" sourceColName="rawLine" regex="^(.*)\t"/>
     <field column="string" name="string" sourceColName="rawLine" regex="\t(.*)$"/>

             </entity>            
        </entity>
    </document>
</dataConfig>

Here is my schema.xml

<field name="id" type="text_general" indexed="true" stored="true" multiValued="false" required="true"/>
<field name="string" type="text_general" indexed="true" stored="true" multiValued="false"/>
<field name="_version_" type="long" indexed="true" stored="true"/>

 <uniqueKey>id</uniqueKey>

What I'm doing wrong?


回答1:


You have rootEntity=true for both levels of entities. So, you will only get one document for the outer entity. Try setting the outer level rootEntity to false.

Also, you can just send tab-separated files to the Solr with CSV processor, no DIH required.



来源:https://stackoverflow.com/questions/37629261/solr-dih-regextransformer-processes-only-one-csv-line

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!