Regex Expression For a String

前端 未结 3 631
余生分开走
余生分开走 2021-01-23 10:22

I want to split the string in python.

Sample string:

Hi this is ACT I. SCENE 1 and SCENE 2 and this is ACT II. SCENE 1 and SCENE 2 and more

3条回答
  •  我在风中等你
    2021-01-23 10:54

    Here is a working script, albeit a bit hackish:

    inp = "Hi this is ACT I. SCENE 1 and SCENE 2 and this is ACT II. SCENE 1 and SCENE 2 and more"
    parts = re.findall(r'[A-Z]{2,}(?: [A-Z0-9.]+)*|(?![A-Z]{2})\w+(?: (?![A-Z]{2})\w+)*', inp)
    print(parts)
    

    This prints:

    ['Hi this is', 'ACT I. SCENE 1', 'and', 'SCENE 2', 'and this is', 'ACT II. SCENE 1',
     'and', 'SCENE 2', 'and more']
    

    An explanation of the regex logic, which uses an alternation to match one of two cases:

    [A-Z]{2,}              match TWO or more capital letters
    (?: [A-Z0-9.]+)*       followed by zero or more words, consisting only of
                           capital letters, numbers, or period
    |                      OR
    (?![A-Z]{2})\w+        match a word which does NOT start with two capital letters
    (?: (?![A-Z]{2})\w+)*  then match zero or more similar terms
    

提交回复
热议问题