I crawled one URL with Nutch 2.1 and then I want to re-crawl pages after they got updated. How can I do this? How can I know that a page is updated?
what about http://pascaldimassimo.com/2010/06/11/how-to-re-crawl-with-nutch/
This is discussed on : How to recrawle nutch
I am wondering if the above mentioned solution will indeed work. I am trying as we speak. I crawl news-sites and they update their frontpage quite frequently, so I need to re-crawl the index/frontpage often and fetch the newly discovered links.