问题描述 | Problem Description
前两天把Hadoop集群搭起来之后出现了一个问题:
从master端启动hadoop,然后jps,可以看到NameNode、JobTracker、SecondaryNameNode均已启动,
但是slave端的DataNode则无法启动,或者启动之后几秒就自动关闭了,所以带着问题查看了log信息。
[root@slave1 hadoop-1.0.4]# cat logs/hadoop-root-datanode-slave1.log 2013-01-30 07:13:36,699 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: STARTUP_MSG: /************************************************************ STARTUP_MSG: Starting DataNode STARTUP_MSG: host = slave1/10.0.3.124 STARTUP_MSG: args = [] STARTUP_MSG: version = 1.0.4 STARTUP_MSG: build = https://svn.apache.org/repos/asf/hadoop/common/branches/branch-1.0 -r 1393290; compiled by 'hortonfo' on Wed Oct 3 05:13:58 UTC 2012 ************************************************************/ 2013-01-30 07:13:36,913 INFO org.apache.hadoop.metrics2.impl.MetricsConfig: loaded properties from hadoop-metrics2.properties 2013-01-30 07:13:36,924 INFO org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean for source MetricsSystem,sub=Stats registered. 2013-01-30 07:13:36,925 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Scheduled snapshot period at 10 second(s). 2013-01-30 07:13:36,925 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: DataNode metrics system started 2013-01-30 07:13:37,031 INFO org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean for source ugi registered. 2013-01-30 07:13:38,357 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: master/10.0.3.123:9000. Already tried 0 time(s). 2013-01-30 07:13:39,359 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: master/10.0.3.123:9000. Already tried 1 time(s). 2013-01-30 07:13:40,360 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: master/10.0.3.123:9000. Already tried 2 time(s). 2013-01-30 07:13:41,373 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: master/10.0.3.123:9000. Already tried 3 time(s). 2013-01-30 07:13:43,376 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: master/10.0.3.123:9000. Already tried 4 time(s). 2013-01-30 07:13:44,377 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: master/10.0.3.123:9000. Already tried 5 time(s). 2013-01-30 07:13:45,379 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: master/10.0.3.123:9000. Already tried 6 time(s). 2013-01-30 07:13:46,380 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: master/10.0.3.123:9000. Already tried 7 time(s). 2013-01-30 07:13:47,381 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: master/10.0.3.123:9000. Already tried 8 time(s). 2013-01-30 07:13:48,383 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: master/10.0.3.123:9000. Already tried 9 time(s). 2013-01-30 07:13:48,385 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: java.io.IOException: Call to master/10.0.3.123:9000 failed on local exception: java.net.NoRouteToHostException: No route to host at org.apache.hadoop.ipc.Client.wrapException(Client.java:1107) at org.apache.hadoop.ipc.Client.call(Client.java:1075) at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:225) at $Proxy5.getProtocolVersion(Unknown Source) at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:396) at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:370) at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:429) at org.apache.hadoop.ipc.RPC.waitForProxy(RPC.java:331) at org.apache.hadoop.ipc.RPC.waitForProxy(RPC.java:296) at org.apache.hadoop.hdfs.server.datanode.DataNode.startDataNode(DataNode.java:356) at org.apache.hadoop.hdfs.server.datanode.DataNode.<init>(DataNode.java:299) at org.apache.hadoop.hdfs.server.datanode.DataNode.makeInstance(DataNode.java:1582) at org.apache.hadoop.hdfs.server.datanode.DataNode.instantiateDataNode(DataNode.java:1521) at org.apache.hadoop.hdfs.server.datanode.DataNode.createDataNode(DataNode.java:1539) at org.apache.hadoop.hdfs.server.datanode.DataNode.secureMain(DataNode.java:1665) at org.apache.hadoop.hdfs.server.datanode.DataNode.main(DataNode.java:1682) Caused by: java.net.NoRouteToHostException: No route to host at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:599) at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206) at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:489) at org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:434) at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:560) at org.apache.hadoop.ipc.Client$Connection.access$2000(Client.java:184) at org.apache.hadoop.ipc.Client.getConnection(Client.java:1206) at org.apache.hadoop.ipc.Client.call(Client.java:1050) ... 14 more |
问题分析 | Problem Analysis
从log信息中很明显看出slave尝试连接了10次master,均以失败而告终,于是抛出了java.net.NoRouteToHostException
造成出现这个异常的原因一般有二:
于是尝试解决。
解决方案 | Problem Solution
将master端的防火墙关闭
然后重新启动Hadoop,slave端的DataNode已成功启动,log信息中也未出现错误信息。
问题成功解决。
问题总结 | Problem Summary
又是防火墙问题 - -#
评论