Python: Parsing and grouping filenames in directory

十年热恋 提交于 2020-01-05 09:09:32

问题


I'm pretty new to python, but I have lots of experience with MATLAB & C.

What I need to do it parse the filenames of files in a particular directory, separate them into groups according to the fields within the file names, and perform operations within these groups.

Specifically, the filenames are:

PROJECT-x-SUBJECT-x-SESSION-x-TYPE.extension

where that '-x-' has been purposely inserted as the field divider. I need to do operations on every group of files that shares the same PROJECT-x-SUBJECT-x-SESSION component.

_______My best attempt follows: ________

I can parse each of the files one at a time by:

dirList=os.listdir(directory)
for fname in dirList:  
    # kill extension
    ext = os.path.splitext(fname)
    # get the 4 fields 
    labels=ext[0].split('-x-')
    PROJECT_list.append(labels[0])
    SUBJECT_list.append(labels[1])
    ...

... which reflects this only idea I have had on how to organize this stuff: by creating 4 lists and appending to them for each filename.

Then with my 4 (ordered?) lists, I could then call something like:

from collections import Counter
c=Counter(SESSION_list) 
list(c)

Then at least I have a unique list of SESSION names

Suggestions? I could go on, but since I really just need a starting point, I think that this is sufficient.

Thanks, guys.


回答1:


You can use defaultdict to make a dictionary that contains lists:

from collections import defaultdict

groups = defaultdict(list)

for filename in os.listdir(directory):
    basename, extension = os.path.splitext(filename)
    project, subject, session, ftype = basename.split('-x-')

    groups[session].append(filename)

Now, groups contains a mapping between session names and filenames.




回答2:


How about using a defaultdict to group filenames, glob to find the appropriate files, and fileinput to read lines from all files with the same key. (untested)

import os
from glob import glob
import fileinput
from collections import defaultdict

filenames = glob('*-x-*')
dd = defaultdict(list)
for filename in filenames:
    name, ext = os.path.splitext(filename)
    dd[tuple(name.split('-x-')[:3])].append(filename)

for key, fnames in dd.iteritems():
     for line in fileinput.FileInput(fnames):
         pass # do something with lines from files with same key


来源:https://stackoverflow.com/questions/14719621/python-parsing-and-grouping-filenames-in-directory

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!