A pythonic way to insert a space before capital letters

邮差的信 提交于 2019-12-03 00:01:17

You could try:

>>> re.sub(r"(\w)([A-Z])", r"\1 \2", "WordWordWord")
'Word Word Word'

If there are consecutive capitals, then Gregs result could not be what you look for, since the \w consumes the caracter in front of the captial letter to be replaced.

>>> re.sub(r"(\w)([A-Z])", r"\1 \2", "WordWordWWWWWWWord")
'Word Word WW WW WW Word'

A look-behind would solve this:

>>> re.sub(r"(?<=\w)([A-Z])", r" \1", "WordWordWWWWWWWord")
'Word Word W W W W W W Word'

Perhaps shorter:

>>> re.sub(r"\B([A-Z])", r" \1", "DoIThinkThisIsABetterAnswer?")
Markus Jarderot

Have a look at my answer on .NET - How can you split a “caps” delimited string into an array?

Edit: Maybe better to include it here.

re.sub(r'([a-z](?=[A-Z])|[A-Z](?=[A-Z][a-z]))', r'\1 ', text)

For example:

"SimpleHTTPServer" => ["Simple", "HTTP", "Server"]

Maybe you would be interested in one-liner implementation without using regexp:

''.join(' ' + char if char.isupper() else char.strip() for char in text).strip()

With regexes you can do this:

re.sub('([A-Z])', r' \1', str)

Of course, that will only work for ASCII characters, if you want to do Unicode it's a whole new can of worms :-)

If you have acronyms, you probably do not want spaces between them. This two-stage regex will keep acronyms intact (and also treat punctuation and other non-uppercase letters as something to add a space on):

re_outer = re.compile(r'([^A-Z ])([A-Z])')
re_inner = re.compile(r'(?<!^)([A-Z])([^A-Z])')
re_outer.sub(r'\1 \2', re_inner.sub(r' \1\2', 'DaveIsAFKRightNow!Cool'))

The output will be: 'Dave Is AFK Right Now! Cool'

I think regexes are the way to go here, but just to give a pure python version without (hopefully) any of the problems ΤΖΩΤΖΙΟΥ has pointed out:

def splitCaps(s):
    result = []
    for ch, next in window(s+" ", 2):
        result.append(ch)
        if next.isupper() and not ch.isspace():
            result.append(' ')
    return ''.join(result)

window() is a utility function I use to operate on a sliding window of items, defined as:

import collections, itertools

def window(it, winsize, step=1):
    it=iter(it)  # Ensure we have an iterator
    l=collections.deque(itertools.islice(it, winsize))
    while 1:  # Continue till StopIteration gets raised.
        yield tuple(l)
        for i in range(step):
            l.append(it.next())
            l.popleft()
monkut

I agree that the regex solution is the easiest, but I wouldn't say it's the most pythonic.

How about:

text = 'WordWordWord'
new_text = ''

for i, letter in enumerate(text):
    if i and letter.isupper():
        new_text += ' '

    new_text += letter
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!