Split string based on regex

后端 未结 3 1730
后悔当初
后悔当初 2020-12-01 00:44

What is the best way to split a string like \"HELLO there HOW are YOU\" by upper case words (in Python)?

So I\'d end up with an array like such:

相关标签:
3条回答
  • 2020-12-01 00:56

    You could use a lookahead:

    re.split(r'[ ](?=[A-Z]+\b)', input)
    

    This will split at every space that is followed by a string of upper-case letters which end in a word-boundary.

    Note that the square brackets are only for readability and could as well be omitted.

    If it is enough that the first letter of a word is upper case (so if you would want to split in front of Hello as well) it gets even easier:

    re.split(r'[ ](?=[A-Z])', input)
    

    Now this splits at every space followed by any upper-case letter.

    0 讨论(0)
  • 2020-12-01 00:56

    Your question contains the string literal "\b[A-Z]{2,}\b", but that \b will mean backspace, because there is no r-modifier.

    Try: r"\b[A-Z]{2,}\b".

    0 讨论(0)
  • 2020-12-01 01:13

    I suggest

    l = re.compile("(?<!^)\s+(?=[A-Z])(?!.\s)").split(s)
    

    Check this demo.

    0 讨论(0)
提交回复
热议问题