Nutch not crawling URLs except the one specified in seed.txt

前端 未结 2 2051
悲&欢浪女
悲&欢浪女 2021-01-15 19:53

I am using Apache Nutch 1.12 and the URLs I am trying to crawl is something like https://www.mywebsite.com/abc-def/ which is the only entry in my seed.txt file. Since I don\

2条回答
  •  陌清茗
    陌清茗 (楼主)
    2021-01-15 20:14

    You may try to tweak properties available in conf/nutch-default.xml. maybe control the number of outlinks your want or modify fetch properties. If you decide to overwrite any property, copy that info to conf/nutch-site.xml and put new value there.

提交回复
热议问题