问题
For starters I've only been playing with python for about a 2 weeks now and im relatively new to its proccessess, I'm trying to create a script that compares two directories with subdirectories and prints out ANY changes. I've read articles on hear about using os.walk to walk the directories and I've managed to write the script that prints all the files in a directory and its subdirectories in a understandable manner. I've also read on here and learned how to compare two directories but it only compares 1 file deep.
import os
x = 'D:\\xfiles'
y = 'D:\\yfiles'
q= [ filename for filename in x if filename not in y ]
print q
Obviously that does not do what I want it to. This however is listing all files and all directories.
import os
x = 'D:\\xfiles'
x1 = os.walk(x)
for dirName, subdirList, fileList in x1:
print ('Directory: %s' % dirName)
for fname in fileList:
print ('\%s' % fname)
How do I combine them and get it to work?
回答1:
I guess that best way to go will be external programs, as @Robᵩ suggests in the comment.
Using Python I would recommend doing following:
import os
def fileIsSame(right, left, path):
return os.path.exists (os.path.join(left, path.replace(right, '')));
def compare(right, left):
difference = list();
for root, dirs, files in os.walk(right):
for name in files:
path = os.path.join(root, name);
# check if file is same
if fileIsSame(right, left, path):
if os.path.isdir(path):
# recursively check subdirs
difference.extend(compare(path, left));
else:
# count file as difference
difference.append(path);
return difference;
This approach lacks normal fileIsSame
function that would make sure files are same by content or by date modified and be sure to handle paths correctly (as I'm not sure this variant will). This algorithm requres you to specify full paths.
Usage example:
print (compare(r'c:\test', r'd:\copy_of_test'));
If second folder is copy of first, all the differences in paths (different disk letter and foldername) is ignored. Output will be []
.
回答2:
Write a function to aggregate your listing.
import os
def listfiles(path):
files = []
for dirName, subdirList, fileList in os.walk(path):
dir = dirName.replace(path, '')
for fname in fileList:
files.append(os.path.join(dir, fname))
return files
x = listfiles('D:\\xfiles')
y = listfiles('D:\\yfiles')
You could use a list comprehension to extract the files that are not in both directories.
q = [filename for filename in x if filename not in y]
But using sets is much more efficient and flexible.
files_only_in_x = set(x) - set(y)
files_only_in_y = set(y) - set(x)
files_only_in_either = set(x) ^ set(y)
files_in_both = set(x) & set(y)
all_files = set(x) | set(y)
回答3:
import os
def ls(path):
all = []
walked = os.walk(path)
for base, sub_f, files in walked:
for sub in sub_f:
entry = os.path.join(base,sub)
entry = entry[len(path):].strip("\\")
all.append(entry)
for file in files:
entry = os.path.join(base,file)
entry = entry[len(path):].strip("\\")
all.append(entry)
all.sort()
return all
def folder_diff(folder1_path, folder2_path):
folder1_list = ls(folder1_path);
folder2_list = ls(folder2_path);
diff = [item for item in folder1_list if item not in folder2_list]
diff.extend([item for item in folder2_list if item not in folder1_list])
return diff
来源:https://stackoverflow.com/questions/19251993/comparing-two-directories-with-subdirectories-to-find-any-changes