Efficiency aspect of delta import in solr

吃可爱长大的小学妹 提交于 2019-12-12 04:11:37

问题


I have data of about 2100000 rows. The time taken for full-import is about 2 minutes. For any updates in table I'm using delta import to index the updates. The time taken for delta import is 6 minutes.

Considering the efficiency aspect it is better to do full import rather than delta import. So, what is the need of delta import? Is there any better way to use delta import to increase it's efficiency?

I followed the steps in documentation.

data-config.xml

<dataConfig>
<dataSource type="JdbcDataSource" driver="com.dbschema.CassandraJdbcDriver" url="jdbc:cassandra://127.0.0.1:9042/test" autoCommit="true" rowLimit = '-1' batchSize="-1"/>
<document name="content">
    <entity name="test" query="SELECT * from person" deltaImportQuery="select * from person where seq=${dataimporter.delta.seq}" deltaQuery="select seq from person where last_modified &gt; '${dataimporter.last_index_time}' ALLOW FILTERING" autoCommit="true">
        <field column="seq" name="id" />
        <field column="last" name="last_s" />
        <field column="first" name="first_s" />
        <field column="city" name="city_s" />
        <field column="zip" name="zip_s" />
        <field column="street" name="street_s" />
        <field column="age" name="age_s" />
        <field column="state" name="state_s" />
        <field column="dollar" name="dollar_s" />
        <field column="pick" name="pick_s" />
    </entity>
</document>


回答1:


The usual way of setting up delta indexing (like you did), runs 2 queries instead of a single one. So in some cases it might not be optimal.

I prefer to setup delta like this, so there is single query to maintain, it's cleaner, and delta runs in a single query. You should try it, it might improve things. The downside is the deletes, you either do some soft-deleting or you still need the usual delta configuration for that (I favour the first).

Also, of course, make sure the last_modified column is properly indexed. I am not familiar with Cassandra jdbc driver, you should double check.

Last thing, if you are using Datastax Entreprise Edition, you can query it via Solr if you configured for that. In this case you could also try indexing off SolrEntityProcessor and with some request param trick you can do full and delta indexing too. I used it succesfully in the past.



来源:https://stackoverflow.com/questions/45540956/efficiency-aspect-of-delta-import-in-solr

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!