启动namenode报错:Journal Storage Directory /var/bigdata/hadoop/full/dfs/jn/dmgeo not formatted
啟動namenode報(bào)錯:Journal Storage Directory /var/bigdata/hadoop/full/dfs/jn/dmgeo not formatted
在測試flink的HA時,把某個節(jié)點(diǎn)(部署了jobmanager和namenode)的節(jié)點(diǎn)reboot了,然后啟動時發(fā)現(xiàn)namenode沒有起來,報(bào)錯大概如下:
org.apache.hadoop.hdfs.qjournal.protocol.JournalNotFormattedException: Journal Storage Directory /tmp/hadoop/dfs/journalnode/xxxx not formattedat org.apache.hadoop.hdfs.qjournal.server.Journal.checkFormatted(Journal.java:457)原因:大概為journalnode保存的元數(shù)據(jù)和namenode的不一致,導(dǎo)致,3臺機(jī)器中有2臺報(bào)了這個錯誤。
解決:在nn1上啟動journalnode,再執(zhí)行hdfs namenode -initializeSharedEdits,使得journalnode與namenode保持一致。再重新啟動namenode就沒有問題了。
但又遇到flink的jobmanager啟動不了,報(bào)錯如下:
ERROR org.apache.flink.runtime.entrypint.XlusterEntrypoint -Fatal error occurred in the cluster entrypoint.org.apache.flink.runtime.dispatcher.DispatcherException: Failed to take leadership with session id xxxxxxxxxxxxxxxxxxxxxxxxxx.... caused by: java.lang.RuntimeException: org.apache.flink.util.FlinkException: Could not retrieve submitted JobGraph from state handle under /xxxxx. This indicates that the retrieved state handle is broken. Try cleaning the state handle store. .. caused by: java.io.FileNotFoundException: File does not exitst: /xxxx/submittedJobGraphe439cfc979db節(jié)點(diǎn)reboot時,是有任務(wù)在執(zhí)行的,而剛才journalnode的initializeSharedEdits導(dǎo)致某些文件丟失了,而jobmanager在讀取這個提交的job時發(fā)生了報(bào)錯,故在zookeeper刪除flink任務(wù)的引用即可
./zkCli.sh -server zookeeper的hostset /flink/default/running_job_registry/xxxxx DONE delete /flink/default/jobgraphs/xxxx解決后,重新啟動jobmanager、taskmanager沒有問題了,再提交任務(wù)就可以了。
總結(jié)
以上是生活随笔為你收集整理的启动namenode报错:Journal Storage Directory /var/bigdata/hadoop/full/dfs/jn/dmgeo not formatted的全部內(nèi)容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: 关于linux分区与挂载的解释
- 下一篇: hbase中对deadserver处理存