How to create multiple folders with names, and extract multiple zips to each different folder, with python?

♀尐吖头ヾ 提交于 2019-12-23 21:14:46

问题


I'm having trouble creating many different directories for a number of different zip folders containing different raster data and then extracting all the zips to the new folders in a clean script.

I have accomplished my task by my code is very long and messy. I need to have folders that are labeled like NE34_E , NE35_E etc, and then within these directories, I need subfolders such as N34_24 , N34_25 etc. which the raster data will be extracted to. I have over 100 zip files that need to be extracted and placed in subfolders.

After making some changes to the way I was making directories this is a sample of my script.

My file structure goes like this:

N\\N36_E\\N36_24
N\\N36_E\\N35_25
... etc.

Zipfile names:

n36_e024_1arc_v3_bil.zip
n36_e025_1arc_v3_bil.zip
n36_e026_1arc_v3_bil.zip
... etc.

Python code to create the directory structure:

import os

#Create Sub directories for "NE36_"
pathname1 = "NE36_"
pathname2 = 24
directory = "D:\\Capstone\\Test\\N36_E\\" + str(pathname1) + str(pathname2)
while pathname2 < 46:
    if not os.path.exists(directory):
        os.makedirs(directory)
    pathname2 += 1
    directory = "D:\\Capstone\\Test\\N36_E\\" + str(pathname1) + str(pathname2)

#Create Sub directories for "NE37_"
pathname1 = "NE37_"
pathname2 = 24
directory = "D:\\Capstone\\Test\\N37_E\\" + str(pathname1) + str(pathname2)
while pathname2 < 46:
    if not os.path.exists(directory):
        os.makedirs(directory)
    pathname2 += 1
    directory = "D:\\Capstone\\Test\\N37_E\\" + str(pathname1) + str(pathname2)

回答1:


import glob, os, re, zipfile

# Setup main paths.
zipfile_rootdir = r'D:\Capstone\Zipfiles'
extract_rootdir = r'D:\Capstone\Test'

# Process the zip files.
re_pattern = re.compile(r'\A([a-zA-Z])(\d+)_([a-zA-Z])0{0,2}(\d+)')

for zip_file in glob.iglob(os.path.join(zipfile_rootdir, '*.zip')):

    # Get the parts from the base zip filename using regular expressions.
    part = re.findall(re_pattern, os.path.basename(zip_file))[0]

    # Make all items in part uppercase using a list comprehension.
    part = [item.upper() for item in part]

    # Create a dict of the parts to make useful parts to be used for folder names.
    # E.g. from ['N', '36', 'E', '24']
    folder = {'outer': '{0}{1}_{2}'.format(*part),
              'inner': '{0}{2}{1}_{3}'.format(*part)}

    # Build the extraction path from each part.
    extract_path = os.path.join(extract_rootdir, folder['outer'], folder['inner'])

    # Perform the extract of all files from the zipfile.
    with zipfile.ZipFile(zip_file, 'r') as zip:
        zip.extractall(extract_path)

2 main settings to set values, which is:

  1. zipfile_rootdir is where the zip file are located.
  2. extract_rootdir is where to extract to.

The r before the string is treat as raw string, so backslash escaping is not needed.

A regular expression is compiled and used to extract the text from the zip file names used for the extraction path.

From zip file:

n36_e024_1arc_v3_bil.zip

extracts a part sequence with use of a regular expression:

n, 36, e, 24

Each item is uppercased and used to create a dictionary named folders containing keys and values:

'outer': 'N36_E'
'inner': 'NE36_24'

extract_path will store the full path by joining extract_rootdir with folder['outer'] and folder['inner'].

Finally, using a Context Manager by use of with, the zip files will be extracted.


Regular Expression:

re_pattern = re.compile(r'\A([a-zA-Z])(\d+)_([a-zA-Z])0{0,2}(\d+)')

The compile of the regular expression pattern before the loop is to avoid multiple compiles of the pattern in the loop. The use of r before the string is to inform Python that that the string should be interpreted as raw i.e. no backslash escaping. Raw strings are useful for regular expressions as backslash escaping is used for the patterns.

The regular expression pattern:

\A([a-zA-Z])(\d+)_([a-zA-Z])0{0,2}(\d+)

The string for the regular expression to work on:

n36_e024_1arc_v3_bil.zip
  1. \A Matches only at the start of the string. This is an anchor and does not match any character.
  2. ([a-zA-Z]) Matches any alphabet character. [] is match any characters within. Any character between the range of a to z and A to Z is matched. n will be matched. The enclosing () is store that group captured into the returned sequence. So the sequence is now n,.
  3. (\d+) Matches 1 digit or more. The \d is any digit and + tells it to keep matching more. Sequence becomes n, 36,.
  4. _ is literal and since () is not enclosing it, it is matched though is not added to the sequence.
  5. ([a-zA-Z]) Same as point 2. Sequence becomes n, 36, e,.
  6. 0{0,2} Match a zero 0, zero to 2 times {0,2}. No (), so not added to the sequence.
  7. (\d+) Same as point 3. Sequence becomes n, 36, e, 24.
  8. The rest of the string is ignored as the pattern has reached it's end. This is why the \A is used so the pattern cannot start anywhere and proceed to the end of the string that is not wanted.

Formatting:

Sequence is N, 36, E, 24 after being uppercased by the list comprehension.

  1. The pattern {0}{1}_{2} is ordered 0, 1, 2, so 0 is N, 1 is 36 and 2 is E to become N36_E. The _ is literal in the pattern.
  2. The pattern {0}{2}{1}_{3} is ordered 0, 2, 1, 3. 0 is N, 2 is E, 1 is 36 and 3 is 24 to become NE36_24.

References:

  • Python 2:

    • re module for the regular expressions.
    • format method for the formatting of strings.
    • list comprehensions used to uppercase items in the sequence.
    • zipfile module for working with zip archives.
  • Python 3:

    • re module for the regular expressions.
    • format method for the formatting of strings.
    • list comprehensions used to uppercase items in the sequence.
    • zipfile module for working with zip archives.


来源:https://stackoverflow.com/questions/56498940/how-to-create-multiple-folders-with-names-and-extract-multiple-zips-to-each-dif

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!