org.apache.hadoop.yarn.exceptions.YarnException: Failed to initialize queues

本小妞迷上赌 提交于 2020-08-18 07:33:44

在配置yarn-HA高可用集群后,执行yarn-start.sh,发现nodemanager启动成功,而resourcemanager却没有启动,于是:
检查logs:

tail -n 100 hadoop-root-resourcemanager-hadoop01.log

发现resourcemanager启动过程中出现这样的报错:

org.apache.hadoop.service.ServiceStateException: org.apache.hadoop.yarn.exceptions.YarnException: Failed to initialize queues
	at org.apache.hadoop.service.ServiceStateException.convert(ServiceStateException.java:105)
	at org.apache.hadoop.service.AbstractService.init(AbstractService.java:173)
	at org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:109)
	at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceInit(ResourceManager.java:884)
	at org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)
	at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.createAndInitActiveServices(ResourceManager.java:1296)
	at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceInit(ResourceManager.java:339)
	at org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)
	at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:1584)
Caused by: org.apache.hadoop.yarn.exceptions.YarnException: Failed to initialize queues
	at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.initializeQueues(CapacityScheduler.java:757)
	at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.initScheduler(CapacityScheduler.java:342)
	at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.serviceInit(CapacityScheduler.java:418)
	at org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)
	... 7 more
Caused by: java.lang.IllegalStateException: Queue configuration missing child queue names for root
	at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacitySchedulerQueueManager.parseQueue(CapacitySchedulerQueueManager.java:234)
	at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacitySchedulerQueueManager.initializeQueues(CapacitySchedulerQueueManager.java:162)
	at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.initializeQueues(CapacityScheduler.java:748)
	... 10 more
2020-08-16 11:56:56,428 INFO org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: SHUTDOWN_MSG: 

分析了一下应该是没有初始化,于是格式化了一下HA在zookeeper中的记录并删除data和logs重新格式化了一下NameNode,再次启动问题依然没有解决。于是将问题定为到yarn的配置文件,报错日志中提及CapacityScheduler

at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.initializeQueues(CapacityScheduler.java:757)

于是到etc/hadoop/下找了一下,发现并没有找到capacity-scheduler.xml,才想起来拷贝集群时使用了NameNode,并没有resourcemanager的配置文件,遂到以前拷贝的集群中翻出resourcemanager节点的capacity-scheduler.xml文件拷贝到hadoop01和hadoop02对应的目录中,重新格式化zookeeper中的HA记录,分别启动start-dfs.shstart-yarn.sh
成功:

[root@hadoop01 logs]# jpsall
=============== hadoop01 ===============
85060 NameNode
85494 DFSZKFailoverController
85177 DataNode
85771 ResourceManager
79450 QuorumPeerMain
85343 JournalNode
85903 NodeManager
=============== hadoop02 ===============
67841 DataNode
68451 DFSZKFailoverController
62740 QuorumPeerMain
68181 JournalNode
69029 ResourceManager
69224 NodeManager
67643 NameNode
=============== hadoop03 ===============
64498 NameNode
64790 JournalNode
64618 DataNode
65212 NodeManager
60653 QuorumPeerMain
64942 DFSZKFailoverController

查看yarn活跃状态:

[root@hadoop01 logs]# yarn rmadmin -getServiceState rm1
active

再到zk客户端查看yarn:

[zk: localhost:2181(CONNECTED) 0] get -s /yarn-leader-election/cluster-yarn1/ActiveStandbyElectorLock

cluster-yarn1rm1
cZxid = 0x40000002d
ctime = Sun Aug 16 12:25:44 CST 2020
mZxid = 0x40000002d
mtime = Sun Aug 16 12:25:44 CST 2020
pZxid = 0x40000002d
cversion = 0
dataVersion = 0
aclVersion = 0
ephemeralOwner = 0x300037aefe30005
dataLength = 20
numChildren = 0
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!