How can I Schedule data imports in Solr

人走茶凉 提交于 2019-12-17 15:32:09

问题


The wiki page, http://wiki.apache.org/solr/DataImportHandler explains how to index data using DataImportHandler. But the example uses a command to initiate the import operation. How can I schedule a job to do this on a regular basis?c


回答1:


On UNIX/Linux, cron jobs are your friends! On Windows, there is Task Scheduler.

UPDATE
To do it from Java code, since this is a simple GET request, you can use the HTTP Client library. See this tutorial on using the GetMethod.

If you need to programmatically send other requests to Solr, you probably should use the Solrj library. It allows to send all the basic commands to Solr ant it can be configured to access any Solr handlers:

CommonsHttpSolrServer server = new CommonsHttpSolrServer("http://localhost:8983/solr");
ModifiableSolrParams params = new ModifiableSolrParams();
params.set("command", "full-import");
QueryRequest request = new QueryRequest(params);
request.setPath("/dataimport");
server.request(request);



回答2:


I was able to make it work following the steps:

  1. Create classes ApplicationListener, HTTPPostScheduler and SolrDataImportProperties (source code listed on http://wiki.apache.org/solr/DataImportHandler#Scheduling). I believe these classes haven't been committed yet.

  2. Add the following listener to Solr web.xml file:

    <listener>
       <listener-class>org.apache.solr.handler.dataimport.scheduler.ApplicationListener</listener-class>
    </listener>
    
  3. Configure dataimport.properties as per instructions in the wiki page.




回答3:


simple add this line to your crontab with crontab -e command:

0,30 * * * * /usr/bin/wget http://<solr_host>:8983/solr/<core_name>/dataimport?command=full-import 

This will full import every 30 minutes. Replace <solr_host> and <core_name> with your configuration




回答4:


There's a fresh patch by Esteve Fernandez that makes the whole thing work on Unix/Linux: https://issues.apache.org/jira/browse/SOLR-2305

@Eldo If you're going to need more help in building your own JAR just drop a question here...




回答5:


This is a bit old, but I created a Windows WPF application and service to deal with this, as using CRON jobs and Task Scheduler is a bit difficult to maintain if you have a lot of cores / environments.

https://github.com/systemidx/SolrScheduler

You basically just drop in a JSON file in a specified folder and it will use a REST client to issue the commands to Solr.




回答6:


We can use Quartz to do that, which is like the crontab on linux. But basically, the TimerTask embedded in jdk is enough for you.



来源:https://stackoverflow.com/questions/3206171/how-can-i-schedule-data-imports-in-solr

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!