recursive function for extract elements from deep nested lists/tuples

独自空忆成欢 提交于 2019-12-02 05:29:40

问题


I want to write a function that extracts elements from deep nested tuples and lists, say I have something like this

l = ('THIS', [('THAT', ['a', 'b']), 'c', ('THAT', ['d', 'e', 'f'])])

And I want a flat list without 'THIS' and 'THAT':

list = ['a', 'b', 'c', 'd', 'e', 'f']

Here's what I have so far:

def extract(List):
    global terms
    terms = []
    for i in word:
        if type(i) is not str:
            extract(i)
        else:
            if i is not "THIS" and i is not "THAT":
                terms.append(i)
    return terms

But I keep getting list = ['d', 'e', 'f'], it looks like the terms = [] is set again after looping to 'c'.


回答1:


You're doing terms = [] at the top of the function, so of course every time you recursively call the function, you're doing that terms=[] again.

The quickest solution is to write a simple wrapper:

def _extract(List):
    global terms
    for i in word:
        if type(i) is not str:
            _extract(i)
        else:
            if i is not "THIS" and i is not "THAT":
                terms.append(i)
    return terms

def extract(List):
    global terms
    terms = []
    return _extract(List)

One more thing: You shouldn't use is to test for string equality (except in very, very special cases). That tests that they're the same string object in memory. It will happen to work here, at least in CPython (because both "THIS" strings are constants in the same module—and even if they weren't, they'd get interned)—but that's not something you want to rely on. Use ==, which tests that they both mean the same string, whether or not they're actually the identical object.

Testing types for identity is useful a little more often, but still not usually what you want. In fact, you usually don't even want to test types for equality. You don't often have subclasses of str—but if you did, you'd probably want to treat them as str (since that's the whole point of subtyping). And this is even more important for types that you do subclass from more often.

If you don't completely understand all of that, the simple guideline is to just never use is unless you know you have a good reason to.

So, change this:

if i is not "THIS" and i is not "THAT":

… to this:

if i != "THIS" and i != "THAT":

Or, maybe even better (definitely better if you had, say, four strings to check instead of two), use a set membership test instead of anding together multiple tests:

if i not in {"THIS", "THAT"}:

And likewise, change this:

if type(i) is not str:

… to this:

if not isinstance(i, str):

But while we're being all functional here, why not use a closure to eliminate the global?

def extract(List)
    terms = []
    def _extract(List):
        nonlocal terms
        for i in word:
            if not isinstance(i, str):
                _extract(i)
            else:
                if i not in {"THIS", "THAT"}:
                    terms.append(i)
        return terms
    return _extract(List)

This isn't the way I'd solve this problem (wim's answer is probably what I'd do if given this spec and told to solve it with recursion), but this has the virtue of preserving the spirit of (and most of the implementation of) your existing design.




回答2:


It will be good to separate the concerns of "flattening" and "filtering". Decoupled code is easier to write and easier to test. So let's first write a "flattener" using recursion:

from collections import Iterable

def flatten(collection):
    for x in collection:
        if isinstance(x, Iterable) and not isinstance(x, str):
            yield from flatten(x)
        else:
            yield x

Then extract and blacklist:

def extract(data, exclude=()):
    yield from (x for x in flatten(data) if x not in exclude)

L = ('THIS', [('THAT', ['a', 'b']), 'c', ('THAT', ['d', 'e', 'f'])])
print(*extract(L, exclude={'THIS', 'THAT'}))



回答3:


Assuming that the first element of each tuple can be disregarded, and we should recurse with list that is the second element, we can do this:

def extract(node):
    if isinstance(node, tuple):
        return extract(node[1])
    if isinstance(node, list):
        return [item for sublist in [extract(elem) for elem in node] for item in sublist]
    return node

The list comprehension is a little dense, here's the same with loops:

def extract(node):
    if isinstance(node, tuple):
        return extract(node[1])
    if isinstance(node, list):
        result = []
        for item in node:
            for sublist in extract(item):
                for elem in sublist:
                    result.append(elem)
        return result
    return node



回答4:


This iterative function should do the trick alongside the .extend() list operator.

def func(lst):
    new_lst = []
    for i in lst:
        if i != 'THAT' and i != 'THIS':
            if type(i) == list or type(i) == tuple: 
                new_lst.extend(func(i))
            else: new_lst.append(i)
    return new_lst

l = ('THIS', [('THAT', ['a', 'b']), 'c', ('THAT', ['dk', 'e', 'f'])])
print(func(l))

['a', 'b', 'c', 'dk', 'e', 'f']



来源:https://stackoverflow.com/questions/49247894/recursive-function-for-extract-elements-from-deep-nested-lists-tuples

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!