Apache Nutch REST api

自闭症网瘾萝莉.ら 提交于 2019-12-13 20:04:28

问题


I'm trying to launch a crawl via the rest api. A crawl starts with injecting urls. Using a chrome developer tool "Advanced Rest Client" I'm trying to build this POST payload up but the response I get is a 400 Bad Request.

POST - http://localhost:8081/job/create

Payload

{
  "crawl-id":"crawl-01",
  "type":"INJECT",
  "config-id":"default",
  "args":{ "path/to/seedlist/directory"}
}

My problem is in the args, I think more is needed but I'm not sure. In the NutchRESTAPI page this is the sample it gives for creating a job.

POST /job/create
   {
      "crawlId":"crawl-01",
      "type":"FETCH",
      "confId":"default",
      "args":{"someParam":"someValue"}
   }

POST /job/create
   {
      "crawlId":"crawl-01",
      "jobClassName":"org.apache.nutch.fetcher.FetcherJob"
      "confId":"default",
      "args":{"someParam":"someValue"}
   }

I'm not sure what param or value to give each of the commands to complete a job. (eg. Inject, Generate, Fetch, Parse, and UpdateDb) Can someone clear this up? How do I tell the api where to look for the seedlist at?

UPDATE

When trying to complete the Generate command I came into a classException error where the value for the topN key is to be of type long but the api reads it as either a string or an int. I found a fix that is supposed to included in the 2.3.1 release (release date: TBA) and applied it and recompiled my code. It can now work.


回答1:


At the time of this posting, the REST API is not yet complete. A much more detailed document exists, though it's still not comprehensive. It is linked to in the following email from the user mailing list (which you might want to consider joining):

http://www.mail-archive.com/user%40nutch.apache.org/msg13652.html

But to answer your question about the seedlist, you can create the seedlist through REST, or you can use the argument "seedDir"

{
    "args":{
        "seedDir":"/path/to/seed/directory"
    },
    "confId":"default",
    "crawlId":"sample-crawl-01",
    "type":"INJECT"
}


来源:https://stackoverflow.com/questions/30919467/apache-nutch-rest-api

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!