Launching a simple python script on an AWS ray cluster with docker

橙三吉。 提交于 2021-01-07 01:30:54

问题


I am finding it incredibly difficult to follow rays guidelines to running a docker image on a ray cluster in order to execute a python script. I am finding a lack of simple working examples.

So I have the simplest docker file:

FROM rayproject/ray
WORKDIR /usr/src/app
COPY . .
CMD ["step_1.py"]
ENTRYPOINT ["python3"]

I use this to create can image and push this to docker hub. ("myimage" is just an example)

docker build -t myimage .   
docker push myimage

"step_1.py" just prints hello every second for 200 seconds:

import time
for i in range(200):
    time.sleep(1)
    print("hello")

This is my config.yaml. again very simple:

cluster_name: simple-1

min_workers: 0
max_workers: 2

docker:
    image: "myimage"    
    container_name: "my_simple_docker_container"
    pull_before_run: True

idle_timeout_minutes: 5

provider:
    type: aws
    region: eu-west-2
    availability_zone: eu-west-2a

file_mounts_sync_continuously: False



auth:
    ssh_user: ubuntu
    ssh_private_key: /home/user/.ssh/aws_ubuntu_test.pem
head_node:
    InstanceType: c5.2xlarge
    ImageId: ami-xxxxx826a6b31fd2c
    KeyName: aws_ubuntu_test

    BlockDeviceMappings:
      - DeviceName: /dev/sda1
        Ebs:
          VolumeSize: 200

worker_nodes:
   InstanceType: c5.2xlarge
   ImageId: ami-xxxxx826a6b31fd2c
   KeyName: aws_ubuntu_test
   InstanceMarketOptions:
        MarketType: spot

head_setup_commands:
    - pip install boto3==1.4.8

worker_setup_commands:  []

head_start_ray_commands:
    - ray stop
    - ulimit -n 65536; ray start --head --port=6379 --object-manager-port=8076 --autoscaling-config=~/ray_bootstrap_config.yaml

worker_start_ray_commands:
    - ray stop
    - ulimit -n 65536; ray start --address=$RAY_HEAD_IP:6379 --object-manager-port=8076

I hit in the terminal:

ray up simple1.yaml:  

and this error every time:

shared connection to x.x.xx.119 closed.
"docker cp" requires exactly 2 arguments.
See 'docker cp --help'.

Usage:  docker cp [OPTIONS] CONTAINER:SRC_PATH DEST_PATH|-
        docker cp [OPTIONS] SRC_PATH|- CONTAINER:DEST_PATH

Copy files/folders between a container and the local filesystem
Shared connection to x.x.xx.119 closed.

Just to add the docker image will run on any other remote machine just fine, just not on the the ray cluster.

If someone could please help me, I would be eternally grateful, and I will even promise to add a tutorial on medium after my struggles.


回答1:


I think the issue might be around using ENTRYPOINT. The Ray ClusterLauncher starts docker using a command roughly like:

docker run --rm --name <NAME> -d -it --net=host <image_name> bash

When I ran docker build -t myimage . and then ran docker run --rm -it myimage bash, Docker errored with:

python3: can't open file 'bash': [Errno 2] No such file or directory


来源:https://stackoverflow.com/questions/65570374/launching-a-simple-python-script-on-an-aws-ray-cluster-with-docker

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!