Nutch2.x+Hadoop 2.5.2+Hbase0.94.26(续2)
1.執行bin/nutch generate -topN 5 -crawlId tieba的時候,出現以下錯誤
java.lang.Exception: java.lang.ClassCastException: org.apache.avro.generic.GenericData$Record cannot be cast to org.apache.gora.persistency.Persistent
????????at org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:462)
????????at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:529)
Caused by: java.lang.ClassCastException: org.apache.avro.generic.GenericData$Record cannot be cast to org.apache.gora.persistency.Persistent
????????at org.apache.gora.mapreduce.PersistentDeserializer.deserialize(PersistentDeserializer.java:71)
????????at org.apache.gora.mapreduce.PersistentDeserializer.deserialize(PersistentDeserializer.java:35)
????????at org.apache.hadoop.mapreduce.task.ReduceContextImpl.nextKeyValue(ReduceContextImpl.java:146)
????????at org.apache.hadoop.mapreduce.task.ReduceContextImpl.nextKey(ReduceContextImpl.java:121)
????????at org.apache.hadoop.mapreduce.lib.reduce.WrappedReducer$Context.nextKey(WrappedReducer.java:302)
????????at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:170)
????????at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:627)
????????at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:389)
????????at org.apache.hadoop.mapred.LocalJobRunner$Job$ReduceTaskRunnable.run(LocalJobRunner.java:319)
????????at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
????????at java.util.concurrent.FutureTask.run(FutureTask.java:266)
????????at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
????????at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
????????at java.lang.Thread.run(Thread.java:745)
初步懷疑是avrò的版本不匹配,把avrò從1.7.7降級到1.7.6問題依然存在。然后發現執行nutch的時候,classpath里面調用的都是hadoop 2.5.2的jar,而在hadoop-2.5.2/share/hadoop/common/lib/下,avro的版本是1.7.4,把1.7.7版本替換進去,問題解決
2.執行bin/nutch fetch 1421804965-1372033824 -crawlId tieba -threads 50,其中1421804965-1372033824為在hbase shell中執行 get 'tieba_webpage','com.baidu.tieba:http/' 所得f:bid timestamp=1421804970851, value=1421804965-1372033824
此時報錯,No agents listed in 'http.agent.name' property
修改nutch-default.properties中的 <name>http.agent.name</name>部分,添加任意字符串
轉載于:https://www.cnblogs.com/mactech/p/4239163.html
總結
以上是生活随笔為你收集整理的Nutch2.x+Hadoop 2.5.2+Hbase0.94.26(续2)的全部內容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: DELPHI日期时间函数(DateUti
- 下一篇: CapsLock魔改大法——变废为宝实现