How do I save the origin html file with Apache Nutch

后端 未结 5 1306
野的像风
野的像风 2020-12-06 14:13

I\'m new to search engines and web crawlers. Now I want to store all the original pages in a particular web site as html files, but with Apache Nutch I can only get the bina

5条回答
  •  小蘑菇
    小蘑菇 (楼主)
    2020-12-06 14:26

    The answers here are obsolete. Now, it is simply possible to get the plain HTML-files with nutch dump. Please see this answer.

提交回复
热议问题