Nutch No agents listed in 'http.agent.name'

≯℡__Kan透↙ 提交于 2019-11-29 02:54:15

问题


Exception in thread "main" java.lang.IllegalArgumentException: Fetcher: No agents listed in 'http.agent.name' property.
        at org.apache.nutch.fetcher.Fetcher.checkConfiguration(Fetcher.java:1166)
        at org.apache.nutch.fetcher.Fetcher.fetch(Fetcher.java:1068)
        at org.apache.nutch.crawl.Crawl.run(Crawl.java:135)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
        at org.apache.nutch.crawl.Crawl.main(Crawl.java:54)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:616)
        at org.apache.hadoop.util.RunJar.main(RunJar.java:156)

Every time i run ./nutch crawl urls -dir crawl -depth 3 -topN 5 . nutch decides to throw this error. I have both my nutch-site.xml & nutch-default.xml set with.

 <property>
  <name>http.agent.name</name>
  <value>blah</value>
  </property>

Took the description out to make its easier to read. But I fail to see where else the agent name can be specified. if anybody has any advice I would be grateful.


回答1:


using 1.3? If so make sure you changed nutch-site.xml (and not default) in runtime/local/conf Changing the conf in NUTCH_HOME/conf won't be copied to the runtime dirs unless you rebuild with ant.




回答2:


Try giving the agent name for http.robots.agents also. It worked for me. I didn't get that message thereafter!!!



来源:https://stackoverflow.com/questions/6582934/nutch-no-agents-listed-in-http-agent-name

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!