Remove item from list based on the next item in same list

前端 未结 11 2589
悲&欢浪女
悲&欢浪女 2021-02-18 17:08

I just started learning python and here I have a sorted list of protein sequences (total 59,000 sequences) and some of them overlap. I have made a toy list here for example:

11条回答
  •  一个人的身影
    2021-02-18 18:01

    Not an exact match with your expectations, but, given that you state it's sorted (and it's not, near EOEUDNBNUWD EAEUDNBNUW) and that I don't know why you're missing EOEUDNBNUWD I am not sure if your expectations are correctly stated or if I've misread your question.

    (ah, yes, I see the notion of overlap throws a wrench into the sort and startswith approach).

    Might be nice for the OP to restate that particular aspect, I read @DSM comment without really understanding his concern. Now I do.

    li = sorted([i.strip() for i in """
    ABCDE
    ABCDEFG
    ABCDEFGH
    ABCDEFGHIJKLMNO
    CEST
    DBTSFDE
    DBTSFDEO
    EOEUDNBNUW
    EOEUDNBNUWD
    EAEUDNBNUW
    FEOEUDNBNUW
    FG
    FGH""".splitlines() if i.strip()])
    
    def get_iter(li):
        prev = ""
        for i in li:
            if not i.startswith(prev):
                yield(prev)
            prev = i
        yield prev
    
    for v in get_iter(li):
        print(v)
    

    output:

    ABCDEFGHIJKLMNO
    CEST
    DBTSFDEO
    EAEUDNBNUW
    EOEUDNBNUWD
    FEOEUDNBNUW
    FGH
    

提交回复
热议问题