Python shutil copyfile - missing last few lines

偶尔善良 提交于 2019-12-10 13:16:54

问题


I am routinely missing the last few kb of a file I am trying to copy using shutil copyfile.

I did some research and do see someone asking about something similar here: python shutil copy function missing last few lines

But I am using copyfile, which DOES seem to use a with statement...

with open(src, 'rb') as fsrc:
    with open(dst, 'wb') as fdst:
        copyfileobj(fsrc, fdst)

So I am perplexed that more users aren't having this issue, if indeed it is some sort of buffering issue - I would think it'd be more well known.

I am calling copyfile very simply, don't think I could possibly be doing something wrong, essentially doing it the standard way I think:

copyfile(target_file_name,dest_file_name) 

Yet I am missing the last 4kb or so of the file eachtime.

I have also not touched the copyfile function which gets called in shutil which is...

def copyfileobj(fsrc, fdst, length=16*1024):
    """copy data from file-like object fsrc to file-like object fdst"""
    while 1:
        buf = fsrc.read(length)
        if not buf:
            break
        fdst.write(buf)

So I am at a loss, but I suppose I am about to learn something about flushing, buffering, or the with statement, or ... Help! thanks


to Anand: Anand, I avoided mentioning that stuff bc it's my sense that it's not the problem, but since you asked... executive summary is that I am grabbing a file from an FTP, checking if the file is different from the last time I saved a copy, if so, downloading the file and saving a copy. It's circuitous spaghetti code and was written when I was a truly pure utilitarian novice of a coder I guess. It looks like:

for filename in ftp.nlst(filematch):
    target_file_name = os.path.basename(filename)
    with open(target_file_name ,'wb') as fhandle:
    try:
        ftp.retrbinary('RETR %s' % filename, fhandle.write)
        the_files.append(target_file_name)
        mtime = modification_date(target_file_name)
        mtime_str_for_file = str(mtime)[0:10] + str(mtime)[11:13] + str(mtime)[14:16]    + str(mtime)[17:19] + str(mtime)[20:28]#2014-12-11 15:08:00.338415.
        sorted_xml_files = [file for file in glob.glob(os.path.join('\\\\Storage\\shared\\', '*.xml'))]
        sorted_xml_files.sort(key=os.path.getmtime)
        last_file = sorted_xml_files[-1]
        file_is_the_same = filecmp.cmp(target_file_name, last_file)
        if not file_is_the_same:
            print 'File changed!'
            copyfile(target_file_name, '\\\\Storage\\shared\\'+'datebreaks'+mtime_str_for_file+'.xml') 
        else:
            print 'File '+ last_file +' hasn\'t changed, doin nothin'
            continue

回答1:


The issue here would most probably be that , when executing the line -

ftp.retrbinary('RETR %s' % filename, fhandle.write)

This is using the fhandle.write() function to write the data from the ftp server to the file (with name - target_file_name) , but by the time you are calling -shutil.copyfile - the buffer for fhandle has not completely flushed, so you are missing out on some data when copying the file.

To make sure that this does not occur, you can either move the copyfile logic out of the with block for fhandle .

Or you can call fhandle.flush() to flush the buffer , before copying the file .

I believe it would be better to close the file (move the logic out of the with block). Example -

for filename in ftp.nlst(filematch):
    target_file_name = os.path.basename(filename)
    with open(target_file_name ,'wb') as fhandle:
        ftp.retrbinary('RETR %s' % filename, fhandle.write)
    the_files.append(target_file_name)
    mtime = modification_date(target_file_name)
    mtime_str_for_file = str(mtime)[0:10] + str(mtime)[11:13] + str(mtime)[14:16]    + str(mtime)[17:19] + str(mtime)[20:28]#2014-12-11 15:08:00.338415.
    sorted_xml_files = [file for file in glob.glob(os.path.join('\\\\Storage\\shared\\', '*.xml'))]
    sorted_xml_files.sort(key=os.path.getmtime)
    last_file = sorted_xml_files[-1]
    file_is_the_same = filecmp.cmp(target_file_name, last_file)
    if not file_is_the_same:
        print 'File changed!'
        copyfile(target_file_name, '\\\\Storage\\shared\\'+'datebreaks'+mtime_str_for_file+'.xml') 
    else:
        print 'File '+ last_file +' hasn\'t changed, doin nothin'
        continue



回答2:


You are trying to copy a file that was not closed. That's why buffers were not flushed. Move the copyfileobj out of the with block, to allow fhandle beeing closed.

Do:

with open(target_file_name ,'wb') as fhandle:
    ftp.retrbinary('RETR %s' % filename, fhandle.write)

# and here the rest of your code
# so fhandle is closed, and file is stored completely on the disk



回答3:


This looks like there is a better way to do nested withs:

with open(src, 'rb') as fsrc, open(dst, 'wb') as fdst:
        copyfileobj(fsrc, fdst)

I'd try something more like this. I'm far from an expert, hopefully someone more knowledgeable can lend some insight. My best thought is that the inner with closes before the outer one.



来源:https://stackoverflow.com/questions/31546902/python-shutil-copyfile-missing-last-few-lines

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!