Nutch not crawling URLs except the one specified in seed.txt
I am using Apache Nutch 1.12 and the URLs I am trying to crawl is something like https://www.mywebsite.com/abc-def/ which is the only entry in my seed.txt file. Since I don't want any page to be crawl that doesn't have "abc-def" in the URL so I have put the following line in regex-urlfilter.txt : +^https://www.mywebsite.com/abc-def/(.+)*$ When I try to run the following crawl command : **/bin/crawl -i -D solr.server.url=http://mysolr:3737/solr/coreName $NUTCH_HOME/urls/ $NUTCH_HOME/crawl 3** It crawl and index just one seed.txt url and in 2nd iteration it just say: Generator: starting at 2017