I am trying to read a file from an FTP server. The file is a .gz
file. I would like to know if I can perform actions on this file while the socket is open. I tr
Make sure to login to the ftp server first. After this, use retrbinary
which pulls the file in binary mode. It uses a callback on each chunk of the file. You can use this to load it into a string.
from ftplib import FTP
ftp = FTP('ftp.ncbi.nlm.nih.gov')
ftp.login() # Username: anonymous password: anonymous@
# Setup a cheap way to catch the data (could use StringIO too)
data = []
def handle_binary(more_data):
data.append(more_data)
resp = ftp.retrbinary("RETR pub/pmc/PMC-ids.csv.gz", callback=handle_binary)
data = "".join(data)
Bonus points: how about we decompress the string while we're at it?
Easy mode, using data string above
import gzip
import StringIO
zippy = gzip.GzipFile(fileobj=StringIO.StringIO(data))
uncompressed_data = zippy.read()
Little bit better, full solution:
from ftplib import FTP
import gzip
import StringIO
ftp = FTP('ftp.ncbi.nlm.nih.gov')
ftp.login() # Username: anonymous password: anonymous@
sio = StringIO.StringIO()
def handle_binary(more_data):
sio.write(more_data)
resp = ftp.retrbinary("RETR pub/pmc/PMC-ids.csv.gz", callback=handle_binary)
sio.seek(0) # Go back to the start
zippy = gzip.GzipFile(fileobj=sio)
uncompressed = zippy.read()
In reality, it would be much better to decompress on the fly but I don't see a way to do that with the built in libraries (at least not easily).
That is not possible. To process data on the server, you need to have some sort of execution permissions, be it for a shell script you would send or SQL access.
FTP is pure file transfer, no execution allowed. You will need either to enable SSH access, load the data into a Database and access that with queries or download the file with urllib
then process it locally, like this:
import urllib
handle = urllib.urlopen('ftp://ftp.ncbi.nlm.nih.gov/pub/pmc/PMC-ids.csv.gz')
# Use data, maybe: buffer = handle.read()
In particular, I think the third one is the only zero-effort solution.
There are two easy ways I can think of to download a file using FTP and store it locally:
Using ftplib
:
from ftplib import FTP
ftp = FTP('ftp.ncbi.nlm.nih.gov')
ftp.login()
ftp.cwd('pub/pmc')
ftp.retrbinary('RETR PMC-ids.csv.gz', open('PMC-ids.csv.gz', 'wb').write)
ftp.quit()
Using urllib
from urllib import urlretrieve
urlretrieve("ftp://ftp.ncbi.nlm.nih.gov/pub/pmc/PMC-ids.csv.gz", "PMC-ids.csv.gz")
If you don't want to download and store it to a file, but you want to process it gradually as it comes, I suggest using urllib2
:
from urllib2 import urlopen
u = urlopen("ftp://ftp.ncbi.nlm.nih.gov/pub/pmc/readme.txt")
for line in u:
print line
which prints your file line by line.