How to slice numbered lists into sublists

我是研究僧i 提交于 2019-12-11 15:39:24

问题


I have opened a file and used readlines() and split() with regex '\t' to remove TABs and it has resulted into the following lists:

["1", "cats", "--,"]
["2", "chase", "--,"]
["3", "dogs", "--,"]
["1", "the", "--,"]
["2", "car", "--,"]
["3", "is", "--,"]
["4", "gray", "--,"]

Now I want to extract and slice this into sublists like "cats chase dogs" and "the car is gray" by looping the integers on index [0] as sentence boundaries. For instance 1 - 3 to sublist "cats chase dogs" and then continue counting 1 - 4 to sublist "the car is gray" and so on for the rest of the lists so I get sublists ["the", "car", "is", "gray" ]. How do I do this?

I've tried this I'm but getting an error:

Can't concatenate int + str

Detecting "i" in the for loop as a string element instead of an integer:

with open(buffer, 'r') as f:
    words = []
    for line in f:
        items = line.split('\t')[:1]
        for i in items:
            while i>1:
                i = i+1
                print i

回答1:


Something like:

from itertools import groupby

with open('yourfile') as fin:
    # split lines
    lines = (line.split() for line in fin)
    # group by consecutive ints
    grouped = groupby(enumerate(lines), lambda (idx, el): idx - int(el[0]))
    # build sentences from words in groups
    sentences = [' '.join(el[1][1] for el in g) for k, g in grouped]
    # ['cats chase dogs', 'the car is gray']

NB: This works based on your example data of:

example = [
    ["1", "cats", "--,"],
    ["2", "chase", "--,"],
    ["3", "dogs", "--,"],
    ["1", "the", "--,"],
    ["2", "car", "--,"],
    ["3", "is", "--,"],
    ["4", "gray", "--,"]
]



回答2:


Choosing the suitable data structures make the job easier:

container = [["1", "cats", "--,"],
             ["2", "chase", "--,"],
             ["3", "dogs", "--,"],
             ["1", "the", "--,"],
             ["2", "car", "--,"],
             ["3", "is", "--,"],
             ["4", "gray", "--,"]]

Nest your lists in a container list then use a dictionary to store the output lists:

from collections import defaultdict

out = defaultdict(list)              # Initialize dictionary for output
key = 0                              # Initialize key  

for idx, word, _ in container:       # Unpack sublists
    if int(idx) == 1:                # Check if we are at start of new sentence
        key += 1                     # Increment key for new sentence
    out[key].append(word)            # Add word to list

Gives:

{
    1: ['cats', 'chase', 'dogs'], 
    2: ['the', 'car', 'is', 'gray']
}


来源:https://stackoverflow.com/questions/20826436/how-to-slice-numbered-lists-into-sublists

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!