Sort a list of strings based on regular expression match

放肆的年华 提交于 2020-05-23 09:23:49

问题


I have a text file that looks a bit like:

random text random text, can be anything blabla %A blabla
random text random text, can be anything blabla %D blabla
random text random text, can be anything blabla blabla %F
random text random text, can be anything blabla blabla
random text random text, %C can be anything blabla blabla

When I readlines() it in, it becomes a list of sentences. Now I want this list to be sorted by the letter after the %. So basically, when the sort is applied to the above, it should look like:

random text random text, can be anything blabla %A blabla
random text random text, %C can be anything blabla blabla
random text random text, can be anything blabla %D blabla
random text random text, can be anything blabla blabla %F
random text random text, can be anything blabla blabla

Is there a good way to do this, or will I have to break each string in to tubles, and then move the letters to a specific column, and then sort using key=operator.itemgetter(col)?

Thank you


回答1:


In [1]: def grp(pat, txt): 
   ...:     r = re.search(pat, txt)
   ...:     return r.group(0) if r else '&'

In [2]: y
Out[2]: 
['random text random text, can be anything blabla %A blabla',
 'random text random text, can be anything blabla %D blabla',
 'random text random text, can be anything blabla blabla %F',
 'random text random text, can be anything blabla blabla',
 'random text random text, %C can be anything blabla blabla']

In [3]: y.sort(key=lambda l: grp("%\w", l))

In [4]: y
Out[4]: 
['random text random text, can be anything blabla %A blabla',
 'random text random text, %C can be anything blabla blabla',
 'random text random text, can be anything blabla %D blabla',
 'random text random text, can be anything blabla blabla %F',
 'random text random text, can be anything blabla blabla']



回答2:


what about this? hope this helps.

def k(line):
    v = line.partition("%")[2]
    v = v[0] if v else 'z' # here z stands for the max value
    return v
print ''.join(sorted(open('data.txt', 'rb'), key = k))



回答3:


You could use a custom key function to compare the strings. Using the lambda syntax you can write that inline, like so:

strings.sort(key=lambda str: re.sub(".*%", "", str));

The re.sub(".*%", "", str) call effectively removes anything before the first percent sign so if the string has a percent sign it'll compare what comes after it, otherwise it'll compare the entire string.

Pedantically speaking, this doesn't just use the letter following the percent sign, it also uses everything after. If you want to use the letter and only the letter try this slightly longer line:

strings.sort(key=lambda str: re.sub(".*%(.).*", "\\1", str));



回答4:


Here is a quick-and-dirty approach. Without knowing more about the requirements of your sort, I can't know if this satisfies your need.

Assume that your list is held in 'listoflines':

listoflines.sort( key=lambda x: x[x.find('%'):] )

Note that this will sort all lines without a '%' character by their final character.



来源:https://stackoverflow.com/questions/1082413/sort-a-list-of-strings-based-on-regular-expression-match

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!