Download files from an FTP server containing given string using Python

ε祈祈猫儿з 提交于 2020-04-04 10:12:06

问题


I'm trying to download a large number of files that all share a common string (DEM) from an FTP sever. These files are nested inside multiple directories. For example, Adair/DEM* and Adams/DEM*

The FTP sever is located here: ftp://ftp.igsb.uiowa.edu/gis_library/counties/ and requires no username and password. So, I'd like to go through each county and download the files containing the string DEM.

I've read many questions here on Stack Overflow and the documentation from Python, but cannot figure out how to use ftplib.FTP() to get into the site without a username and password (which is not required), and I can't figure out how to grep or use glob.glob inside of ftplib or urllib.

Thanks in advance for your help


回答1:


Ok, seems to work. There may be issues if trying to download a directory, or scan a file. Exception handling may come handy to trap wrong filetypes and skip.

glob.glob cannot work since you're on a remote filesystem, but you can use fnmatch to match the names

Here's the code: it download all files matching *DEM* in TEMP directory, sorting by directory.

import ftplib,sys,fnmatch,os

output_root = os.getenv("TEMP")

fc = ftplib.FTP("ftp.igsb.uiowa.edu")
fc.login()
fc.cwd("/gis_library/counties")

root_dirs = fc.nlst()
for l in root_dirs:
    sys.stderr.write(l + " ...\n")
    #print(fc.size(l))
    dir_files = fc.nlst(l)
    local_dir = os.path.join(output_root,l)
    if not os.path.exists(local_dir):
        os.mkdir(local_dir)

    for f in dir_files:
        if fnmatch.fnmatch(f,"*DEM*"):   # cannot use glob.glob
            sys.stderr.write("downloading "+l+"/"+f+" ...\n")
            local_filename = os.path.join(local_dir,f)
            with open(local_filename, 'wb') as fh:
                fc.retrbinary('RETR '+ l + "/" + f, fh.write)

fc.close()



回答2:


The answer by @Jean with the local pattern matching is the correct portable solution adhering to FTP standards.

Though as most FTP servers do support non-standard wildcard use with file listing commands, you can almost always use a simpler and mainly more efficient solution like:

files = ftp.nlst("*DEM*")
for f in files:
    with open(f, 'wb') as fh:
        ftp.retrbinary('RETR ' + f, fh.write)


来源:https://stackoverflow.com/questions/38943398/download-files-from-an-ftp-server-containing-given-string-using-python

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!