问题
I'm having trouble creating many different directories for a number of different zip folders containing different raster data and then extracting all the zips to the new folders in a clean script.
I have accomplished my task by my code is very long and messy. I need to have folders that are labeled like NE34_E
, NE35_E
etc, and then within these directories, I need subfolders such as N34_24
, N34_25
etc. which the raster data will be extracted to. I have over 100 zip files that need to be extracted and placed in subfolders.
After making some changes to the way I was making directories this is a sample of my script.
My file structure goes like this:
N\\N36_E\\N36_24 N\\N36_E\\N35_25 ... etc.
Zipfile names:
n36_e024_1arc_v3_bil.zip n36_e025_1arc_v3_bil.zip n36_e026_1arc_v3_bil.zip ... etc.
Python code to create the directory structure:
import os
#Create Sub directories for "NE36_"
pathname1 = "NE36_"
pathname2 = 24
directory = "D:\\Capstone\\Test\\N36_E\\" + str(pathname1) + str(pathname2)
while pathname2 < 46:
if not os.path.exists(directory):
os.makedirs(directory)
pathname2 += 1
directory = "D:\\Capstone\\Test\\N36_E\\" + str(pathname1) + str(pathname2)
#Create Sub directories for "NE37_"
pathname1 = "NE37_"
pathname2 = 24
directory = "D:\\Capstone\\Test\\N37_E\\" + str(pathname1) + str(pathname2)
while pathname2 < 46:
if not os.path.exists(directory):
os.makedirs(directory)
pathname2 += 1
directory = "D:\\Capstone\\Test\\N37_E\\" + str(pathname1) + str(pathname2)
回答1:
import glob, os, re, zipfile
# Setup main paths.
zipfile_rootdir = r'D:\Capstone\Zipfiles'
extract_rootdir = r'D:\Capstone\Test'
# Process the zip files.
re_pattern = re.compile(r'\A([a-zA-Z])(\d+)_([a-zA-Z])0{0,2}(\d+)')
for zip_file in glob.iglob(os.path.join(zipfile_rootdir, '*.zip')):
# Get the parts from the base zip filename using regular expressions.
part = re.findall(re_pattern, os.path.basename(zip_file))[0]
# Make all items in part uppercase using a list comprehension.
part = [item.upper() for item in part]
# Create a dict of the parts to make useful parts to be used for folder names.
# E.g. from ['N', '36', 'E', '24']
folder = {'outer': '{0}{1}_{2}'.format(*part),
'inner': '{0}{2}{1}_{3}'.format(*part)}
# Build the extraction path from each part.
extract_path = os.path.join(extract_rootdir, folder['outer'], folder['inner'])
# Perform the extract of all files from the zipfile.
with zipfile.ZipFile(zip_file, 'r') as zip:
zip.extractall(extract_path)
2 main settings to set values, which is:
zipfile_rootdir
is where the zip file are located.extract_rootdir
is where to extract to.
The r
before the string is treat as raw string,
so backslash escaping is not needed.
A regular expression is compiled and used to extract the text from the zip file names used for the extraction path.
From zip file:
n36_e024_1arc_v3_bil.zip
extracts a part sequence with use of a regular expression:
n, 36, e, 24
Each item is uppercased and used to create a dictionary
named folders
containing keys and values:
'outer': 'N36_E' 'inner': 'NE36_24'
extract_path
will store the full path by joining
extract_rootdir
with folder['outer']
and folder['inner']
.
Finally, using a Context Manager by use of with
, the zip files will be extracted.
Regular Expression:
re_pattern = re.compile(r'\A([a-zA-Z])(\d+)_([a-zA-Z])0{0,2}(\d+)')
The compile of the regular expression pattern before
the loop is to avoid multiple compiles of the pattern
in the loop.
The use of r
before the string is to inform Python
that that the string should be interpreted as raw
i.e. no backslash escaping.
Raw strings are useful for regular expressions as
backslash escaping is used for the patterns.
The regular expression pattern:
\A([a-zA-Z])(\d+)_([a-zA-Z])0{0,2}(\d+)
The string for the regular expression to work on:
n36_e024_1arc_v3_bil.zip
\A
Matches only at the start of the string. This is an anchor and does not match any character.([a-zA-Z])
Matches any alphabet character.[]
is match any characters within. Any character between the range ofa
toz
andA
toZ
is matched.n
will be matched. The enclosing()
is store that group captured into the returned sequence. So the sequence is nown,
.(\d+)
Matches 1 digit or more. The\d
is any digit and+
tells it to keep matching more. Sequence becomesn, 36,
._
is literal and since()
is not enclosing it, it is matched though is not added to the sequence.([a-zA-Z])
Same as point 2. Sequence becomesn, 36, e,
.0{0,2}
Match a zero0
, zero to 2 times{0,2}
. No()
, so not added to the sequence.(\d+)
Same as point 3. Sequence becomesn, 36, e, 24
.- The rest of the string is ignored as the pattern
has reached it's end. This is why the
\A
is used so the pattern cannot start anywhere and proceed to the end of the string that is not wanted.
Formatting:
Sequence is N, 36, E, 24
after being uppercased
by the list comprehension.
- The pattern
{0}{1}_{2}
is ordered0, 1, 2
, so 0 isN
, 1 is36
and 2 isE
to becomeN36_E
. The_
is literal in the pattern. - The pattern
{0}{2}{1}_{3}
is ordered0, 2, 1, 3
. 0 isN
, 2 isE
, 1 is36
and 3 is24
to becomeNE36_24
.
References:
Python 2:
- re module for the regular expressions.
- format method for the formatting of strings.
- list comprehensions used to uppercase items in the sequence.
- zipfile module for working with zip archives.
Python 3:
- re module for the regular expressions.
- format method for the formatting of strings.
- list comprehensions used to uppercase items in the sequence.
- zipfile module for working with zip archives.
来源:https://stackoverflow.com/questions/56498940/how-to-create-multiple-folders-with-names-and-extract-multiple-zips-to-each-dif