How to make HDFS work in docker swarm

有些话、适合烂在心里 提交于 2021-02-18 06:44:07

问题


I have troubles to make my HDFS setup work in docker swarm. To understand the problem I've reduced my setup to the minimum :

  • 1 physical machine
  • 1 namenode
  • 1 datanode

This setup is working fine with docker-compose, but it fails with docker-swarm, using the same compose file.

Here is the compose file :

version: '3'
services:
  namenode:
      image: uhopper/hadoop-namenode
      hostname: namenode
      ports:
        - "50070:50070"
        - "8020:8020"
      volumes:
        - /userdata/namenode:/hadoop/dfs/name
      environment:
        - CLUSTER_NAME=hadoop-cluster

  datanode:
    image: uhopper/hadoop-datanode
    depends_on:
      - namenode
    volumes:
      - /userdata/datanode:/hadoop/dfs/data
    environment:
      - CORE_CONF_fs_defaultFS=hdfs://namenode:8020

To test it, I have installed an hadoop client on my host (physical) machine with only this simple configuration in core-site.xml :

<configuration>
  <property><name>fs.defaultFS</name><value>hdfs://0.0.0.0:8020</value></property>
</configuration>

Then I run the following command :

hdfs dfs -put test.txt /test.txt

With docker-compose (just running docker-compose up) it's working and the file is written in HDFS.

With docker-swarm, I'm running :

docker swarm init 
docker stack deploy --compose-file docker-compose.yml hadoop

Then when all services are up, I put my file on HDFS it fails like this :

INFO hdfs.DataStreamer: Exception in createBlockOutputStream
org.apache.hadoop.net.ConnectTimeoutException: 60000 millis timeout while waiting for channel to be ready for connect. ch : java.nio.channels.SocketChannel[connection-pending remote=/x.x.x.x:50010]
        at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:534)
        at org.apache.hadoop.hdfs.DataStreamer.createSocketForPipeline(DataStreamer.java:259)
        at org.apache.hadoop.hdfs.DataStreamer.createBlockOutputStream(DataStreamer.java:1692)
        at org.apache.hadoop.hdfs.DataStreamer.nextBlockOutputStream(DataStreamer.java:1648)
        at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:704)
18/06/14 17:29:41 WARN hdfs.DataStreamer: Abandoning BP-1801474405-10.0.0.4-1528990089179:blk_1073741825_1001
18/06/14 17:29:41 WARN hdfs.DataStreamer: Excluding datanode DatanodeInfoWithStorage[10.0.0.6:50010,DS-d7d71735-7099-4aa9-8394-c9eccc325806,DISK]
18/06/14 17:29:41 WARN hdfs.DataStreamer: DataStreamer Exception
org.apache.hadoop.ipc.RemoteException(java.io.IOException): File /test.txt._COPYING_ could only be replicated to 0 nodes instead of minReplication (=1).  There are 1 datanode(s) running and 1 node(s) are excluded in this operation.

If I look in the web UI the datanode seems to be up and no issue is reported...

Update : it seems that dependsOn is ignored by swarm, but it does not seem to be the cause of my problem : I've restarted the datanode when the namenode is up but it did not work better.

Thanks for your help :)


回答1:


The whole mess stems from interaction between docker swarm using overlay networks and how the HDFS name node keeps track of its data nodes. The namenode records the datanode IPs/hostnames based the datanode's overlay network IPs. When the HDFS client asks for read/write operations directly on the datanodes, the namenode reports back the IPs/hostnames of the datanodes based on the overlay network. Since the overlay network is not accessible to the external clients, any rw operations will fail.

The final solution (after lots of struggling to get overlay network to work) I used was to have the HDFS services use the host network. Here's a snippet from the compose file:

version: '3.7'

x-deploy_default: &deploy_default
  mode: replicated
  replicas: 1
  placement:
    constraints:
      - node.role == manager
  restart_policy:
    condition: any
    delay: 5s

services:
  hdfs_namenode:
    deploy:
      <<: *deploy_default
    networks:
      hostnet: {}
    volumes:
      - hdfs_namenode:/hadoop-3.2.0/var/name_node
    command:
      namenode -fs hdfs://${PRIMARY_HOST}:9000
    image: hadoop:3.2.0

  hdfs_datanode:
    deploy:
      mode: global
    networks:
      hostnet: {}
    volumes:
      - hdfs_datanode:/hadoop-3.2.0/var/data_node
    command:
      datanode -fs hdfs://${PRIMARY_HOST}:9000
    image: hadoop:3.2.0
volumes:
  hdfs_namenode:
  hdfs_datanode:

networks:
  hostnet:
    external: true
    name: host


来源:https://stackoverflow.com/questions/50861281/how-to-make-hdfs-work-in-docker-swarm

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!