Why is copying a directory with Ansible so slow?

前端 未结 4 810
野趣味
野趣味 2020-12-23 13:07

I\'m using Ansible to copy a directory (900 files, 136MBytes) from one host to another:

---
- name: copy a directory
  copy: src={{some_directory}} dest={{re         


        
相关标签:
4条回答
  • 2020-12-23 13:35

    TLDR: use synchronize instead of copy.

    Here's the copy command I'm using:

    - copy: src=testdata dest=/tmp/testdata/
    

    As a guess, I assume the sync operations are slow. The files module documentation implies this too:

    The "copy" module recursively copy facility does not scale to lots (>hundreds) of files. For alternative, see synchronize module, which is a wrapper around rsync.

    Digging into the source shows each file is processed with SHA1. That's implemented using hashlib.sha1. A local test implies that only takes 10 seconds for 900 files (that happen to take 400mb of space).

    So, the next avenue. The copy is handled with module_utils/basic.py's atomic_move method. I'm not sure if accelerated mode helps (it's a mostly-deprecated feature), but I tried pipelining, putting this in a local ansible.cfg:

    [ssh_connection]
    pipelining=True
    

    It didn't appear to help; my sample took 24 minutes to run . There's obviously a loop that checks a file, uploads it, fixes permissions, then starts on the next file. That's a lot of commands, even if the ssh connection is left open. Reading between the lines it makes a little bit of sense- the "file transfer" can't be done in pipelining, I think.

    So, following the hint to use the synchronize command:

    - synchronize: src=testdata dest=/tmp/testdata/
    

    That took 18 seconds, even with pipeline=False. Clearly, the synchronize command is the way to go in this case.

    Keep in mind synchronize uses rsync, which defaults to mod-time and file size. If you want or need checksumming, add checksum=True to the command. Even with checksumming enabled the time didn't really change- still 15-18 seconds. I verified the checksum option was on by running ansible-playbook with -vvvv, that can be seen here:

    ok: [testhost] => {"changed": false, "cmd": "rsync --delay-updates -FF --compress --checksum --archive --rsh 'ssh  -o StrictHostKeyChecking=no' --out-format='<<CHANGED>>%i %n%L' \"testdata\" \"user@testhost:/tmp/testdata/\"", "msg": "", "rc": 0, "stdout_lines": []}
    
    0 讨论(0)
  • 2020-12-23 13:44

    synchronize configuration can be difficult in environments with become_user. For one-time deployments you can archive source directory and copy it with unarchive module:

    - name: copy a directory
      unarchive:
        src: some_directory.tar.gz
        dest: {{remote_directory}}
        creates: {{remote_directory}}/indicator_file
    
    0 讨论(0)
  • 2020-12-23 13:58

    Best solution I have found is to just zip the folder and use the unarchive module.

    450 MB folder finished in 1 minute.


    unarchive:
       src: /home/user/folder1.tar.gz
       dest: /opt
    
    0 讨论(0)
  • 2020-12-23 13:58

    While synchronize is more preferable in this case than copy, it’s baked by rsync. It means that drawbacks of rsync (client-server architecture) are remained as well: CPU and disc boundaries, slow in-file delta calculations for large files etc. Sounds like for you the speed is critical, so I would suggest you look for a solution based on peer-to-peer architecture, which is fast and easily scalable to many machines. Something like BitTorrent-based, Resilio Connect.

    0 讨论(0)
提交回复
热议问题