Apache Nutch 2.1 different batch id (null)

烂漫一生 提交于 2019-12-09 06:09:34

问题


I crawl few sites with Apache Nutch 2.1.

While crawling I see the following message on lot of pages:
ex. Skipping http://www.domainname.com/news/subcategory/111111/index.html; different batch id (null).

What causes this error ?
How can I resolve this problem, because the pages with different batch id (null) are not stored in database.

The site that I crawled is based on drupal, but i have tried on many others non drupal sites.


回答1:


I think, the message is not problem. batch_id not assigned to all of url. So, if batch_id is null , skip url. Generate url when batch_id assined for url.



来源:https://stackoverflow.com/questions/14828438/apache-nutch-2-1-different-batch-id-null

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!