问题
The wiki page, http://wiki.apache.org/solr/DataImportHandler explains how to index data using DataImportHandler. But the example uses a command to initiate the import operation. How can I schedule a job to do this on a regular basis?c
回答1:
On UNIX/Linux, cron jobs are your friends! On Windows, there is Task Scheduler.
UPDATE
To do it from Java code, since this is a simple GET request, you can use the HTTP Client library. See this tutorial on using the GetMethod.
If you need to programmatically send other requests to Solr, you probably should use the Solrj library. It allows to send all the basic commands to Solr ant it can be configured to access any Solr handlers:
CommonsHttpSolrServer server = new CommonsHttpSolrServer("http://localhost:8983/solr");
ModifiableSolrParams params = new ModifiableSolrParams();
params.set("command", "full-import");
QueryRequest request = new QueryRequest(params);
request.setPath("/dataimport");
server.request(request);
回答2:
I was able to make it work following the steps:
Create classes ApplicationListener, HTTPPostScheduler and SolrDataImportProperties (source code listed on http://wiki.apache.org/solr/DataImportHandler#Scheduling). I believe these classes haven't been committed yet.
Add the following listener to Solr web.xml file:
<listener> <listener-class>org.apache.solr.handler.dataimport.scheduler.ApplicationListener</listener-class> </listener>
Configure dataimport.properties as per instructions in the wiki page.
回答3:
simple add this line to your crontab with crontab -e
command:
0,30 * * * * /usr/bin/wget http://<solr_host>:8983/solr/<core_name>/dataimport?command=full-import
This will full import every 30 minutes. Replace <solr_host>
and <core_name>
with your configuration
回答4:
There's a fresh patch by Esteve Fernandez that makes the whole thing work on Unix/Linux: https://issues.apache.org/jira/browse/SOLR-2305
@Eldo If you're going to need more help in building your own JAR just drop a question here...
回答5:
This is a bit old, but I created a Windows WPF application and service to deal with this, as using CRON jobs and Task Scheduler is a bit difficult to maintain if you have a lot of cores / environments.
https://github.com/systemidx/SolrScheduler
You basically just drop in a JSON file in a specified folder and it will use a REST client to issue the commands to Solr.
回答6:
We can use Quartz to do that, which is like the crontab on linux. But basically, the TimerTask embedded in jdk is enough for you.
来源:https://stackoverflow.com/questions/3206171/how-can-i-schedule-data-imports-in-solr