Python looping through string and matching it with with wildcard pattern

醉酒当歌 提交于 2020-01-21 19:24:12

问题


string1="abc"
string2="abdabcdfg"

I want to find if string1 is substring of string2. However, there are wildcard characters like "." can be any letter, y can be "a" or "d", x can be "b" or "c". as a result, ".yx" will be substring of string2.

How can I code it using only one loop? I want to loop through string2 and make comparisons at each index. i tried dictionary but I wand to use loop my code:

def wildcard(string,substring):
    sum=""
    table={'A': '.', 'C': '.', 'G': '.', 'T': '.','A': 'x', 'T': 'x', 'C': 'y', 'G': 'y'}
    for c in strand:
        if (c in table) and table[c] not in sum:
            sum+=table[c]
        elif c not in table:
            sum+=c
    if sum==substring:
        return True
    else:
        return False

print wildcard("TTAGTTA","xyT.")#should be true

回答1:


I know you are specifically asking for a solution using a loop. However, I would suppose a different approach: You can easily translate your pattern to a regular expression. This is a similar language for string patterns, just much more powerful. You can then use the re module to check whether that regular expression (and thus your substring pattern) can be found in the string.

def to_regex(pattern, table):
    # join substitutions from table, using c itself as default
    return ''.join(table.get(c, c) for c in pattern)

import re
symbols = {'.': '[a-z]', '#': '[ad]', '+': '[bc]'}
print re.findall(to_regex('.+#', symbols), 'abdabcdfg')

If you prefer a more "hands-on" solution, you can use this, using loops.

def find_matches(pattern, table, string):
    for i in range(len(string) - len(pattern) + 1):
        # for each possible starting position, check the pattern
        for j, c in enumerate(pattern):
            if string[i+j] not in table.get(c, c):
                break # character does not match
        else:
            # loop completed without triggering the break
            yield string[i : i + len(pattern)]

symbols = {'.': 'abcdefghijklmnopqrstuvwxyz', '#': 'ad', '+': 'bc'}
print list(find_matches('.+#', symbols, 'abdabcdfg'))

Output in both cases is ['abd', 'bcd'], i.e. it can be found two times, using these substitutions.



来源:https://stackoverflow.com/questions/24677290/python-looping-through-string-and-matching-it-with-with-wildcard-pattern

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!