Split list into lists based on a character occurring inside of an element

…衆ロ難τιáo~ 提交于 2019-12-06 04:25:59

问题


In a list like the one below:

biglist = ['X', '1498393178', '1|Y', '15496686585007',
           '-82', '-80', '-80', '3', '3', '2', '|Y', '145292534176372',
           '-87', '-85', '-85', '3', '3', '2', '|Y', '11098646289856',
           '-91', '-88', '-89', '3', '3', '2', '|Y', '35521515162112',
           '-82', '-74', '-79', '3', '3', '2', '|Z',
           '0.0', '0.0', '0', '0', '0', '0', '0', '4', '0', '154']

There could be some numerical elements preceded by a character. I would like to break this into sub-lists like below:

smallerlist = [
 ['X', '1498393', '1'],
 ['Y', '1549668', '-82', '-80', '-80', '3', '3', '2', ''],
 ['Y', '1452925', '-87', '-85', '-85', '3', '3', '2', ''],
 ['Y', '3552151', '-82', '-74', '-79', '3', '3', '2', ''],
 ['Z', '0.0', '0.0', '0', '0', '0', '0', '0', '4', '0', '154']
]

As you can tell, depending upon the character, the lists could look similar. Otherwise they could have a different number of elements, or dissimilar elements altogether. The main separator is the "|" character. I have tried to run the following code to split up the list, but all I get is the same, larger, list within a list. I.e., list of len(list) == 1.

import itertools

delim = '|'
smallerlist = [list(y) for x, y in itertools.groupby(biglist, lambda z: z == delim)
                if not x]

Any ideas how to split it up successfully?


回答1:


First, a quick oneliner, which is not an optimal solution in terms of space requirements, but it's short and sweet:

>>> smallerlist = [l.split(',') for l in ','.join(biglist).split('|')]
>>> smallerlist
[['X', '1498393178', '1'],
 ['Y', '15496686585007', '-82', '-80', '-80', '3', '3', '2', ''],
 ['Y', '145292534176372', '-87', '-85', '-85', '3', '3', '2', ''],
 ['Y', '11098646289856', '-91', '-88', '-89', '3', '3', '2', ''],
 ['Y', '35521515162112', '-82', '-74', '-79', '3', '3', '2', ''],
 ['Z', '0.0', '0.0', '0', '0', '0', '0', '0', '4', '0', '154']]

Here we join all elements of the big list by a unique non-appearing separator, for example ,, then split by |, and then split again each list into a sublist of the original elements.

But if you're looking for a bit more efficient solution, you can do it with itertools.groupby that will operate on an intermediate list, generated on fly with the breakby() generator, in which elements without | separator are returned as is, and those with separator are split into 3 elements: first part, a list-delimiter (e.g. None), and the second part.

from itertools import groupby

def breakby(biglist, sep, delim=None):
    for item in biglist:
        p = item.split(sep)
        yield p[0]
        if len(p) > 1:
            yield delim
            yield p[1]

smallerlist = [list(g) for k,g in groupby(breakby(biglist, '|', None),
                                          lambda x: x is not None) if k]



回答2:


It would be easier to join the elements of the list into a single string, split the string on the '|' character, then split each of those elements on the what you used to join the list. Probably a comma ,

bigstr = ','.join(biglist)

[line.split(',') for line in bigstr.split('|')]

# returns
[['X', '1498393178', '1'],
 ['Y', '15496686585007', '-82', '-80', '-80', '3', '3', '2', ''],
 ['Y', '145292534176372', '-87', '-85', '-85', '3', '3', '2', ''],
 ['Y', '11098646289856', '-91', '-88', '-89', '3', '3', '2', ''],
 ['Y', '35521515162112', '-82', '-74', '-79', '3', '3', '2', ''],
 ['Z', '0.0', '0.0', '0', '0', '0', '0', '0', '4', '0', '154']]

If the list is very long, you can also iterate over the items in the list, creating a new sublists on when you encounter a pipe character |

new_biglist = []
sub_list = []
for item in biglist:
    if '|' in item:
        end, start = item.split('|')
        sub_list.append(end)
        new_biglist.append(sub_list)
        sub_list = [start]
    else:
        sub_list.append(item)

new_biglist
# return:
[['X', '1498393178', '1'],
 ['Y', '15496686585007', '-82', '-80', '-80', '3', '3', '2', ''],
 ['Y', '145292534176372', '-87', '-85', '-85', '3', '3', '2', ''],
 ['Y', '11098646289856', '-91', '-88', '-89', '3', '3', '2', ''],
 ['Y', '35521515162112', '-82', '-74', '-79', '3', '3', '2', '']]



回答3:


Here is a solution to a similar problem I didn't find an answer to. How to split a list into sublists delimited by a member, e.g. character:

l = ['r', 'g', 'b', ':',
     'D', 'E', 'A', 'D', '/',
     'B', 'E', 'E', 'F', '/',
     'C', 'A', 'F', 'E']

def split_list(thelist, delimiters):
    ''' Split a list into sub lists, depending on a delimiter.

        delimiters - item or tuple of item
    '''
    results = []
    sublist = []

    for item in thelist:
        if item in delimiters:
            results.append(sublist) # old one
            sublist = []            # new one
        else:
            sublist.append(item)

    if sublist:  # last bit
        results.append(sublist)

    return results


print(
    split_list(l, (':', '/'))
)
# => [['r', 'g', 'b'], ['D', 'E', 'A', 'D'], 
#     ['B', 'E', 'E', 'F'], 
#     ['C', 'A', 'F', 'E']]



回答4:


You don't need regex or anything of the sort - a simple loop and str.split() should be more than enough, at least if you're after an actual efficient solution:

biglist = ['X', '1498393178', '1|Y', '15496686585007', '-82', '-80', '-80', '3', '3', '2',
           '|Y', '145292534176372', '-87', '-85', '-85', '3', '3', '2', '|Y',
           '11098646289856', '-91', '-88', '-89', '3', '3', '2', '|Y', '35521515162112',
           '-82', '-74', '-79', '3', '3', '2', '|Z', '0.0', '0.0', '0', '0', '0', '0',
           '0', '4', '0', '154']

delimiter = "|"
smaller_list = [[]]
for x in biglist:
    if delimiter in x:
        a, b = x.split(delimiter)
        if a:  # remove the check if you also want the empty elements
            smaller_list[-1].append(a)
        smaller_list.append([])
        if b:  # remove the check if you also want the empty elements
            smaller_list[-1].append(b)
    else:
        smaller_list[-1].append(x)

print(smaller_list)
# [['X', '1498393178', '1'],
#  ['Y', '15496686585007', '-82', '-80', '-80', '3', '3', '2'],
#  ['Y', '145292534176372', '-87', '-85', '-85', '3', '3', '2'],
#  ['Y', '11098646289856', '-91', '-88', '-89', '3', '3', '2'],
#  ['Y', '35521515162112', '-82', '-74', '-79', '3', '3', '2'],
#  ['Z', '0.0', '0.0', '0', '0', '0', '0', '0', '4', '0', '154']]


来源:https://stackoverflow.com/questions/45281189/split-list-into-lists-based-on-a-character-occurring-inside-of-an-element

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!