Apache Nutch 2.3.1 map-reduce timeout occurred while updating the score
问题 I have 4 system cluster and Apache Nutch 2.3.1 is configured to crawl few website. After crawling, I have to change their score little big by some custom job. In job, the mapper is just combining the documents based on domain as key. While is reducer, I sum their effective text bytes and find the average. Later I assign the log of average bytes as score. But reducer job took 14 hours and then timeout has occured. While in Nutch builtin job e.g., updatedb is finished in 3 to 4 hours. Where is