Retrieve data from gz file on FTP server without writing it locally

别来无恙 提交于 2019-12-21 06:30:20

问题


I would like to retrieve the data inside a compressed gz file stored on an FTP server, without writing the file to the local archive.

At the moment I have done

from ftplib import FTP
import gzip

ftp = FTP('ftp.server.com')
ftp.login()  
ftp.cwd('/a/folder/')

fileName = 'aFile.gz'

localfile = open(fileName,'wb')
ftp.retrbinary('RETR '+fileName, localfile.write, 1024)

f = gzip.open(localfile,'rb')
data = f.read()

This, however, writes the file "localfile" on the current storage.

I tried to change this in

from ftplib import FTP
import zlib

ftp = FTP('ftp.server.com')
ftp.login()  
ftp.cwd('/a/folder/')

fileName = 'aFile.gz'

data = ftp.retrbinary('RETR '+fileName, zlib.decompress, 1024)

but, ftp.retrbinary does not output the output of its callback. Is there a way to do this?


回答1:


A simple implementation is to:

  • download the file to an in-memory file-like object, like BytesIO;

  • pass that to fileobj parameter of GzipFile constructor.

import gzip
from io import BytesIO
import shutil
from ftplib import FTP

ftp = FTP('ftp.example.com')
ftp.login('username', 'password')

flo = BytesIO()

ftp.retrbinary('RETR /remote/path/archive.tar.gz', flo.write)

flo.seek(0)

with open('archive.tar', 'wb') as fout, gzip.GzipFile(fileobj = flo) as gzip:
    shutil.copyfileobj(gzip, fout)

The above loads whole .gz file to a memory. What can be inefficient for large files. A smarter implementation would stream the data instead. But that would probably require implementing a smart custom file-like object.

See also Get files names inside a zip file on FTP server without downloading whole archive.



来源:https://stackoverflow.com/questions/52990046/retrieve-data-from-gz-file-on-ftp-server-without-writing-it-locally

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!