How to ignore whitespace in a regular expression subject string?

后端 未结 6 612
梦毁少年i
梦毁少年i 2020-11-27 03:33

Is there a simple way to ignore the white space in a target string when searching for matches using a regular expression pattern? For example, if my search is for \"cats\",

6条回答
  •  感情败类
    2020-11-27 04:22

    This approach can be used to automate this (the following exemplary solution is in python, although obviously it can be ported to any language):

    you can strip the whitespace beforehand AND save the positions of non-whitespace characters so you can use them later to find out the matched string boundary positions in the original string like the following:

    def regex_search_ignore_space(regex, string):
        no_spaces = ''
        char_positions = []
    
        for pos, char in enumerate(string):
            if re.match(r'\S', char):  # upper \S matches non-whitespace chars
                no_spaces += char
                char_positions.append(pos)
    
        match = re.search(regex, no_spaces)
        if not match:
            return match
    
        # match.start() and match.end() are indices of start and end
        # of the found string in the spaceless string
        # (as we have searched in it).
        start = char_positions[match.start()]  # in the original string
        end = char_positions[match.end()]  # in the original string
        matched_string = string[start:end]  # see
    
        # the match WITH spaces is returned.
        return matched_string
    
    with_spaces = 'a li on and a cat'
    print(regex_search_ignore_space('lion', with_spaces))
    # prints 'li on'
    

    If you want to go further you can construct the match object and return it instead, so the use of this helper will be more handy.

    And the performance of this function can of course also be optimized, this example is just to show the path to a solution.

提交回复
热议问题