Apache Nutch: FetcherJob throws NoSuchElementException deep in Gora

后端 未结 1 568
旧巷少年郎
旧巷少年郎 2020-12-21 04:17

I\'m running Apache Nutch 2.3.1 out of the box, which uses Gora 0.6.1. I\'ve followed the instructions here: http://wiki.apache.org/nutch/RunNutchInEclipse

It ran fi

相关标签:
1条回答
  • 2020-12-21 04:40

    I confirm the problem is in MemStore.

    In 0.6.1 there is a bug: https://github.com/apache/gora/blob/apache-gora-0.6.1/gora-core/src/main/java/org/apache/gora/memory/store/MemStore.java#L128

    That is already solved in master: https://github.com/apache/gora/blob/master/gora-core/src/main/java/org/apache/gora/memory/store/MemStore.java#L155 , the access to #firstKey() has a guard #isEmpty()

    BUT, don't try to update to Gora 0.7-SNAPSHOT because Nutch is not adapted to it by now.

    Edit

    If you want to use Gora-0.7-SNAPSHOT with Nutch 2.x, maybe you could have it working doing this:

    1. Download Gora's master branch with version 0.7-SNAPSHOT
    2. Do mvn install in gora/ to install it in maven's local repository
    3. Apply this patch to Nutch: https://paste.apache.org/jjqz so Nutch 2.3.1 will work with Gora 0.7-SNAPSHOT
    4. Do Nutch's tutorial stuff

    I hope it works :)

    Edit 2

    About using HBase, it is quite easy to do a local installation for experimenting.

    1. As stated in Nutch2Tutorial, download HBase 0.98.8-hadoop2
    2. Inflate the tar.gz file in a directory, for example: /home/you/hbase
    3. cd /home/you/hbase/bin
    4. ./start-hbase.sh

    Now you have HBase up&running. Configure Nutch:

    ivy/ivy.xml: Look at @Emmanuel's comment about HBase's ivy dependence configuration.

    gora.properties:

    gora.datastore.default=org.apache.gora.hbase.store.HBaseStore
    gora.datastore.autocreateschema=true
    gora.datastore.scanner.caching=100
    

    nutch-site.xml:

    <configuration>
    <property>
     <name>storage.data.store.class</name>
     <value>org.apache.gora.hbase.store.HBaseStore</value>
     <description>Default class for storing data</description>
    </property>
    </configuration>
    

    Done. It will take all the default configurations for HBase: localhost, /tmp/..., blablabla

    0 讨论(0)
提交回复
热议问题