Copy file in Python with copy-on-write (COW)

微笑、不失礼 提交于 2021-02-05 11:31:39

问题


My filesystem (FS) (ZFS specifically) supports copy-on-write (COW), i.e. a copy (if done right) is a very cheap constant operation, and does not actually copy the underlying content. The content is copied only once I write/modify the new file.

Actually, I just found out, ZFS-on-Linux actually has not implemented that for userspace yet (right?). But e.g. BTRFS or XFS has. (See here, here, here, here.)

For the (GNU) cp utility, you would pass --reflink=always option (see here.) cp calls ioctl (dest_fd, FICLONE, src_fd) (see here, here).

How would I get this behavior (if possible) in Python?

I assume that "zero-copy" (e.g. here via os.sendfile) would not result in such behavior, right? Because looking at shutils _fastcopy_sendfile implementation (here), it is still a loop around os.sendfile using some custom byte count (supposed to be the block size, max(os.fstat(infd).st_size, 2 ** 23)). Or would it?

The COW, is this on a file level, or block level?

If possible, I want this to be generic and cross-platform as well, although my question here is somewhat Linux focused. A related question specifically about Mac seems to be this. The MacOSX cp has the -c option to clone a file.


回答1:


While searching further, I actually found the answer, and a related issue report.

Issue 37157 (shutil: add reflink=False to file copy functions to control clone/CoW copies (use copy_file_range)) is exactly about that, which would use FICLONE/FICLONERANGE on Linux.

So I assume that shutil would support this in upcoming Python versions (maybe starting with Python 3.9?).

There is os.copy_file_range (since Python 3.8), which wraps copy_file_range (Linux).

However, according to issue 37159 (Use copy_file_range() in shutil.copyfile() (server-side copy)), Giampaolo Rodola:

Nope, [copy_file_range] doesn't [support CoW] (see man page). We can simply use FICLONE (cp does the same).

However, I'm not sure this is correct, as the copy_file_range man page says:

copy_file_range() gives filesystems an opportunity to implement "copy acceleration" techniques, such as the use of reflinks (i.e., two or more inodes that share pointers to the same copy- on-write disk blocks) or server-side-copy (in the case of NFS).

Issue 26826 (Expose new copy_file_range() syscall in os module) has this comment by Giampaolo Rodola:

I think data deduplication / CoW / reflink copy is better implemented via FICLONE. "cp --reflink" uses it, I presume because it's older than copy_file_range(). ...



来源:https://stackoverflow.com/questions/65492317/copy-file-in-python-with-copy-on-write-cow

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!