Read a file in buffer from FTP python

后端 未结 3 1324
谎友^
谎友^ 2020-12-01 13:06

I am trying to read a file from an FTP server. The file is a .gz file. I would like to know if I can perform actions on this file while the socket is open. I tr

相关标签:
3条回答
  • 2020-12-01 13:32

    Make sure to login to the ftp server first. After this, use retrbinary which pulls the file in binary mode. It uses a callback on each chunk of the file. You can use this to load it into a string.

    from ftplib import FTP
    ftp = FTP('ftp.ncbi.nlm.nih.gov')
    ftp.login() # Username: anonymous password: anonymous@
    
    # Setup a cheap way to catch the data (could use StringIO too)
    data = []
    def handle_binary(more_data):
        data.append(more_data)
    
    resp = ftp.retrbinary("RETR pub/pmc/PMC-ids.csv.gz", callback=handle_binary)
    data = "".join(data)
    

    Bonus points: how about we decompress the string while we're at it?

    Easy mode, using data string above

    import gzip
    import StringIO
    zippy = gzip.GzipFile(fileobj=StringIO.StringIO(data))
    uncompressed_data = zippy.read()
    

    Little bit better, full solution:

    from ftplib import FTP
    import gzip
    import StringIO
    
    ftp = FTP('ftp.ncbi.nlm.nih.gov')
    ftp.login() # Username: anonymous password: anonymous@
    
    sio = StringIO.StringIO()
    def handle_binary(more_data):
        sio.write(more_data)
    
    resp = ftp.retrbinary("RETR pub/pmc/PMC-ids.csv.gz", callback=handle_binary)
    sio.seek(0) # Go back to the start
    zippy = gzip.GzipFile(fileobj=sio)
    
    uncompressed = zippy.read()
    

    In reality, it would be much better to decompress on the fly but I don't see a way to do that with the built in libraries (at least not easily).

    0 讨论(0)
  • 2020-12-01 13:41

    That is not possible. To process data on the server, you need to have some sort of execution permissions, be it for a shell script you would send or SQL access.

    FTP is pure file transfer, no execution allowed. You will need either to enable SSH access, load the data into a Database and access that with queries or download the file with urllib then process it locally, like this:

    import urllib
    handle = urllib.urlopen('ftp://ftp.ncbi.nlm.nih.gov/pub/pmc/PMC-ids.csv.gz')
    # Use data, maybe: buffer = handle.read()
    

    In particular, I think the third one is the only zero-effort solution.

    0 讨论(0)
  • 2020-12-01 13:42

    There are two easy ways I can think of to download a file using FTP and store it locally:

    1. Using ftplib:

      from ftplib import FTP
      
      ftp = FTP('ftp.ncbi.nlm.nih.gov')
      ftp.login()
      ftp.cwd('pub/pmc')
      ftp.retrbinary('RETR PMC-ids.csv.gz', open('PMC-ids.csv.gz', 'wb').write)
      ftp.quit()
      
    2. Using urllib

      from urllib import urlretrieve
      
      urlretrieve("ftp://ftp.ncbi.nlm.nih.gov/pub/pmc/PMC-ids.csv.gz", "PMC-ids.csv.gz")
      

    If you don't want to download and store it to a file, but you want to process it gradually as it comes, I suggest using urllib2:

    from urllib2 import urlopen
    
    u = urlopen("ftp://ftp.ncbi.nlm.nih.gov/pub/pmc/readme.txt")
    
    for line in u:
       print line
    

    which prints your file line by line.

    0 讨论(0)
提交回复
热议问题