Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share
menu search
person
Welcome To Ask or Share your Answers For Others

Categories

I'm running Apache Nutch 2.3.1 out of the box, which uses Gora 0.6.1. I've followed the instructions here: http://wiki.apache.org/nutch/RunNutchInEclipse

It ran fine with the InjectorJob.

Now I'm running the FetcherJob, and Gora uses MemStore as a data store. I have gora.properties containing

gora.datastore.default=org.apache.gora.memory.store.MemStore

This throws:

2016-10-02 22:55:54,605 ERROR mapreduce.GoraRecordReader (GoraRecordReader.java:nextKeyValue(121)) - Error reading Gora records: null
2016-10-02 22:55:54,605 INFO  mapred.MapTask (MapTask.java:flush(1460)) - Starting flush of map output
2016-10-02 22:55:54,614 INFO  mapred.LocalJobRunner (LocalJobRunner.java:runTasks(456)) - map task executor complete.
2016-10-02 22:55:54,615 WARN  mapred.LocalJobRunner (LocalJobRunner.java:run(560)) - job_local874667143_0001
java.lang.Exception: java.lang.RuntimeException: java.util.NoSuchElementException
    at org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:462)
    at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:522)
Caused by: java.lang.RuntimeException: java.util.NoSuchElementException
    at org.apache.gora.mapreduce.GoraRecordReader.nextKeyValue(GoraRecordReader.java:122)
    at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:556)
    at org.apache.hadoop.mapreduce.task.MapContextImpl.nextKeyValue(MapContextImpl.java:80)
    at org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.nextKeyValue(WrappedMapper.java:91)
    at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
    at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
    at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:243)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)
Caused by: java.util.NoSuchElementException
    at java.util.concurrent.ConcurrentSkipListMap.firstKey(ConcurrentSkipListMap.java:2036)
    at org.apache.gora.memory.store.MemStore.execute(MemStore.java:128)
    at org.apache.gora.query.impl.QueryBase.execute(QueryBase.java:73)
    at org.apache.gora.mapreduce.GoraRecordReader.executeQuery(GoraRecordReader.java:67)
    at org.apache.gora.mapreduce.GoraRecordReader.nextKeyValue(GoraRecordReader.java:109)
    ... 12 more
2016-10-02 22:55:55,383 INFO  mapreduce.Job (Job.java:monitorAndPrintJob(1360)) - Job job_local874667143_0001 running in uber mode : false
2016-10-02 22:55:55,385 INFO  mapreduce.Job (Job.java:monitorAndPrintJob(1367)) -  map 0% reduce 0%
2016-10-02 22:55:55,387 INFO  mapreduce.Job (Job.java:monitorAndPrintJob(1380)) - Job job_local874667143_0001 failed with state FAILED due to: NA
2016-10-02 22:55:55,396 INFO  mapreduce.Job (Job.java:monitorAndPrintJob(1385)) - Counters: 0
Exception in thread "main" java.lang.RuntimeException: job failed: name=, jobid=job_local874667143_0001
    at org.apache.nutch.util.NutchJob.waitForCompletion(NutchJob.java:119)
    at org.apache.nutch.fetcher.FetcherJob.run(FetcherJob.java:205)
    at org.apache.nutch.fetcher.FetcherJob.fetch(FetcherJob.java:251)
    at org.apache.nutch.fetcher.FetcherJob.run(FetcherJob.java:314)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
    at org.apache.nutch.fetcher.FetcherJob.main(FetcherJob.java:321)

This happens so deep into Nutch and Gora that I have no idea why it's happening. I tried upgrading to Gora 0.8 but same problem. I tried downgrading Gora to 0.6, same problem. I wanted to switch to another data store like hBase but that's a bit overkill for what I need at this moment.

Please help me figure this out.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
286 views
Welcome To Ask or Share your Answers For Others

1 Answer

I confirm the problem is in MemStore.

In 0.6.1 there is a bug: https://github.com/apache/gora/blob/apache-gora-0.6.1/gora-core/src/main/java/org/apache/gora/memory/store/MemStore.java#L128

That is already solved in master: https://github.com/apache/gora/blob/master/gora-core/src/main/java/org/apache/gora/memory/store/MemStore.java#L155 , the access to #firstKey() has a guard #isEmpty()

BUT, don't try to update to Gora 0.7-SNAPSHOT because Nutch is not adapted to it by now.

Edit

If you want to use Gora-0.7-SNAPSHOT with Nutch 2.x, maybe you could have it working doing this:

  1. Download Gora's master branch with version 0.7-SNAPSHOT
  2. Do mvn install in gora/ to install it in maven's local repository
  3. Apply this patch to Nutch: https://paste.apache.org/jjqz so Nutch 2.3.1 will work with Gora 0.7-SNAPSHOT
  4. Do Nutch's tutorial stuff

I hope it works :)

Edit 2

About using HBase, it is quite easy to do a local installation for experimenting.

  1. As stated in Nutch2Tutorial, download HBase 0.98.8-hadoop2
  2. Inflate the tar.gz file in a directory, for example: /home/you/hbase
  3. cd /home/you/hbase/bin
  4. ./start-hbase.sh

Now you have HBase up&running. Configure Nutch:

ivy/ivy.xml: Look at @Emmanuel's comment about HBase's ivy dependence configuration.

gora.properties:

gora.datastore.default=org.apache.gora.hbase.store.HBaseStore
gora.datastore.autocreateschema=true
gora.datastore.scanner.caching=100

nutch-site.xml:

<configuration>
<property>
 <name>storage.data.store.class</name>
 <value>org.apache.gora.hbase.store.HBaseStore</value>
 <description>Default class for storing data</description>
</property>
</configuration>

Done. It will take all the default configurations for HBase: localhost, /tmp/..., blablabla


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share

548k questions

547k answers

4 comments

86.3k users

...