I\'m new to search engines and web crawlers. Now I want to store all the original pages in a particular web site as html files, but with Apache Nutch I can only get the bina
The answers here are obsolete. Now, it is simply possible to get the plain HTML-files with nutch dump. Please see this answer.
nutch dump