I\'m running Apache Nutch 2.3.1 out of the box, which uses Gora 0.6.1. I\'ve followed the instructions here: http://wiki.apache.org/nutch/RunNutchInEclipse
It ran fi
I confirm the problem is in MemStore.
In 0.6.1 there is a bug: https://github.com/apache/gora/blob/apache-gora-0.6.1/gora-core/src/main/java/org/apache/gora/memory/store/MemStore.java#L128
That is already solved in master: https://github.com/apache/gora/blob/master/gora-core/src/main/java/org/apache/gora/memory/store/MemStore.java#L155 , the access to #firstKey() has a guard #isEmpty()
BUT, don't try to update to Gora 0.7-SNAPSHOT because Nutch is not adapted to it by now.
If you want to use Gora-0.7-SNAPSHOT with Nutch 2.x, maybe you could have it working doing this:
mvn install
in gora/ to install it in maven's local repositoryI hope it works :)
About using HBase, it is quite easy to do a local installation for experimenting.
/home/you/hbase
cd /home/you/hbase/bin
./start-hbase.sh
Now you have HBase up&running. Configure Nutch:
ivy/ivy.xml: Look at @Emmanuel's comment about HBase's ivy dependence configuration.
gora.properties:
gora.datastore.default=org.apache.gora.hbase.store.HBaseStore
gora.datastore.autocreateschema=true
gora.datastore.scanner.caching=100
nutch-site.xml:
<configuration>
<property>
<name>storage.data.store.class</name>
<value>org.apache.gora.hbase.store.HBaseStore</value>
<description>Default class for storing data</description>
</property>
</configuration>
Done. It will take all the default configurations for HBase: localhost, /tmp/..., blablabla