How to split a list-of-strings into sublists-of-strings by a specific string element

ぃ、小莉子 提交于 2020-05-10 07:14:06

问题


I have a word list like below. I want to split the list by .. Is there any better or useful code in Python 3?

a = ['this', 'is', 'a', 'cat', '.', 'hello', '.', 'she', 'is', 'nice', '.']
result = []
tmp = []
for elm in a:
    if elm is not '.':
        tmp.append(elm)
    else:
        result.append(tmp)
        tmp = []
print(result)
# result: [['this', 'is', 'a', 'cat'], ['hello'], ['she', 'is', 'nice']]

Update

Add test cases to handle it correctly.

a = ['this', 'is', 'a', 'cat', '.', 'hello', '.', 'she', 'is', 'nice', '.']
b = ['this', 'is', 'a', 'cat', '.', 'hello', '.', 'she', 'is', 'nice', '.', 'yes']
c = ['.', 'this', 'is', 'a', 'cat', '.', 'hello', '.', 'she', 'is', 'nice', '.', 'yes']
def split_list(list_data, split_word='.'):
    result = []
    sub_data = []
    for elm in list_data:
        if elm is not split_word:
            sub_data.append(elm)
        else:
            if len(sub_data) != 0:
                result.append(sub_data)
            sub_data = []
    if len(sub_data) != 0:
        result.append(sub_data)
    return result

print(split_list(a)) # [['this', 'is', 'a', 'cat'], ['hello'], ['she', 'is', 'nice']]
print(split_list(b)) # [['this', 'is', 'a', 'cat'], ['hello'], ['she', 'is', 'nice'], ['yes']]
print(split_list(c)) # [['this', 'is', 'a', 'cat'], ['hello'], ['she', 'is', 'nice'], ['yes']]

回答1:


Using itertools.groupby

from itertools import groupby
a = ['this', 'is', 'a', 'cat', '.', 'hello', '.', 'she', 'is', 'nice', '.']
result = [list(g) for k,g in groupby(a,lambda x:x=='.') if not k]
print (result)
#[['this', 'is', 'a', 'cat'], ['hello'], ['she', 'is', 'nice']]



回答2:


You can do this all with a "one-liner" using list comprehension and string functions join, split, strip, and no additional libraries.

a = ['this', 'is', 'a', 'cat', '.', 'hello', '.', 'she', 'is', 'nice', '.']
b = ['this', 'is', 'a', 'cat', '.', 'hello', '.', 'she', 'is', 'nice', '.', 'yes']
c = ['.', 'this', 'is', 'a', 'cat', '.', 'hello', '.', 'she', 'is', 'nice', '.', 'yes']



In [5]: [i.strip().split(' ') for i in ' '.join(a).split('.') if len(i) > 0 ]
Out[5]: [['this', 'is', 'a', 'cat'], ['hello'], ['she', 'is', 'nice']]

In [8]: [i.strip().split(' ') for i in ' '.join(b).split('.') if len(i) > 0 ]
Out[8]: [['this', 'is', 'a', 'cat'], ['hello'], ['she', 'is', 'nice'], ['yes']]

In [9]: In [8]: [i.strip().split(' ') for i in ' '.join(c).split('.') if len(i) > 0 ]
Out[9]: [['this', 'is', 'a', 'cat'], ['hello'], ['she', 'is', 'nice'], ['yes']]

@Craig has a simpler update:

[s.split() for s in ' '.join(a).split('.') if s]



回答3:


Here's another way using only standard list operations (with no dependencies on other libraries!). First we find the split points and then we create sublists around them; notice that the first element is treated as a special case:

a = ['this', 'is', 'a', 'cat', '.', 'hello', '.', 'she', 'is', 'nice', '.']
indexes = [-1] + [i for i, x in enumerate(a) if x == '.']

[a[indexes[i]+1:indexes[i+1]] for i in range(len(indexes)-1)]
=> [['this', 'is', 'a', 'cat'], ['hello'], ['she', 'is', 'nice']]



回答4:


You can reconstruct the string using ' '.join and use regex:

import re
a = ['this', 'is', 'a', 'cat', '.', 'hello', '.', 'she', 'is', 'nice', '.']
new_s = [b for b in [re.split('\s', i) for i in re.split('\s*\.\s*', ' '.join(a))] if all(b)]

Output:

[['this', 'is', 'a', 'cat'], ['hello'], ['she', 'is', 'nice']]



回答5:


I couldn't help myself, just wanted to have fun with this great question:

import itertools

a = ['this', 'is', 'a', 'cat', '.', 'hello', '.', 'she', 'is', 'nice', '.']
b = ['this', 'is', 'a', 'cat', '.', 'hello', '.', 'she', 'is', 'nice', '.', 'yes']
c = ['.', 'this', 'is', 'a', 'cat', '.', 'hello', '.', 'she', 'is', 'nice', '.', 'yes']

def split_dots(lst):

    dots = [0] + [i+1 for i, e in enumerate(lst) if e == '.']

    result = [list(itertools.takewhile(lambda x : x != '.', lst[dot:])) for dot in dots]

    return list(filter(lambda x : x, result))

print(split_dots(a)) # [['this', 'is', 'a', 'cat'], ['hello'], ['she', 'is', 'nice']]
print(split_dots(b)) # [['this', 'is', 'a', 'cat'], ['hello'], ['she', 'is', 'nice'], ['yes']]
print(split_dots(c)) # [['this', 'is', 'a', 'cat'], ['hello'], ['she', 'is', 'nice'], ['yes']]



回答6:


This answer requires installing a 3rd party library: iteration_utilities1. The included split function makes solving this task straightforward:

>>> from iteration_utilities import split
>>> a = ['this', 'is', 'a', 'cat', '.', 'hello', '.', 'she', 'is', 'nice', '.']
>>> list(filter(None, split(a, '.', eq=True)))
[['this', 'is', 'a', 'cat'], ['hello'], ['she', 'is', 'nice']]

Instead of using the eq parameter you can also define a custom function where to split:

>>> list(filter(None, split(a, lambda x: x=='.')))
[['this', 'is', 'a', 'cat'], ['hello'], ['she', 'is', 'nice']]

In case you want to keep the '.'s you could also use the keep_before argument:

>>> list(filter(None, split(a, '.', eq=True, keep_before=True)))
[['this', 'is', 'a', 'cat', '.'], ['hello', '.'], ['she', 'is', 'nice', '.']]

Note that the library just makes it easier - it's easily possible (see the other answers) to accomplish this task without installing an additional library.

The filter can be removed if you don't expect '.' to appear at the beginning or end of your to-be-split list.


1 I'm the author of that library. It's available via pip or the conda-forge channel with conda.



来源:https://stackoverflow.com/questions/47604449/how-to-split-a-list-of-strings-into-sublists-of-strings-by-a-specific-string-ele

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!