Search for permutation of characters of a substring in Python

扶醉桌前 提交于 2019-12-20 03:23:33

问题


I am trying to extract the occurrences of a string and of all the permutations of its characters from a line of text.

For example, I need to extract the string t = 'ABC' and all its permutations: 'ABC', 'CAB', 'BCA', 'BAC', 'CBA', from the following string s:

s = 'ABCXABCXXACXXBACXXBCA'

The results is: ABC, ABC, BAC, BCA

The string t should be of any length and can contain any characters in [A-Z], [a-z] and [0-9]

Is there a way to obtain the result by using regular expressions in Python?

I know I could just build a list of all the permutations and then make a separate search for all the items in the list, but I was wondering whether regular expressions can provide the results in a more compact and faster way.


回答1:


Let me sketch an algorithm to solve the problem. It is not a problem to be solved with regex.

This solution maintains a sliding window, and checks the frequency of characters in the window with that of t.

Below is the pseudocode of the algorithm:

function searchPermutation(inpStr, t):
    // You may want to check t against the regex ^[A-Za-z0-9]+$ here

    // Do a frequency counting of character in t
    // For example, t = 'aABBCCC'
    // Then freq = { 'A': 1, 'B': 2, 'C': 3, 'a': 1 }
    freq = frequency(t)

    // Create an empty dict
    window = {}
    // Number of characters in window
    count = 0
    // List of matches
    result = []

    for (i = 0; i < inpStr.length; i++):
        // If the current character is a character in t
        if inpStr[i] in freq:
            // Add the character at current position
            window[inpStr[i]]++

            // If number of character in window is equal to length of t
            if count == t.length:
                // Remove the character at the end of the window
                window[inpStr[i - t.length]]--
                // The count is kept the same here
            else: // Otherwise, increase the count
                count++

            // If all frequencies in window is the same as freq
            if count == t.length and window == freq:
                // Add to the result a match at (i - t.length + 1, i + 1)
                // We can retrieve the string later with substring
                result.append((i - t.length + 1, i + 1))

                // Reset the window and count (prevent overlapping match)
                // Remove the 2 line below if you want to include overlapping match
                window = {}
                count = 0
        else: // If current character not in t
            // Reset the window and count
            window = {}
            count = 0

    return result

This should solve the general problem for any t.




回答2:


A regex solution:

([ABC])(?!\1)([ABC])(?!\1)(?!\2)[ABC]


来源:https://stackoverflow.com/questions/26294878/search-for-permutation-of-characters-of-a-substring-in-python

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!