问题
I want "git log --format='(%h) %s' --abbrev=7 HEAD"
to be split into
[
"git",
"log",
"--format='(%h) %s'",
"--abbrev=7",
"HEAD"
]
How to I achieve this, without splitting on the space within --format='(%h) %s'
?
Answers in any language is welcome :)
回答1:
As often in life, you have choices.
Use an expression that matches and captures different parts. This can be combined with a replacement function as in
import re string = "git log --format='(%h) %s' --abbrev=7 HEAD" rx = re.compile(r"'[^']*'|(\s+)") def replacer(match): if match.group(1): return "#@#" else: return match.group(0) string = rx.sub(replacer, string) parts = re.split('#@#', string) # ^^^ same as in the function replacer
You could use the better regex module with
(*SKIP)(*FAIL)
:import regex as re string = "git log --format='(%h) %s' --abbrev=7 HEAD" rx = re.compile(r"'[^']*'(*SKIP)(*FAIL)|\s+") parts = rx.split(string)
Write yourself a little parser:
def little_parser(string): quote = False stack = '' for char in string: if char == "'": stack += char quote = not quote elif (char == ' ' and not quote): yield stack stack = '' else: stack += char if stack: yield stack for part in little_parser(your_string): print(part)
All three will yield
['git', 'log', "--format='(%h) %s'", '--abbrev=7', 'HEAD']
回答2:
As I understand, the idea is to split the string on contiguous spaces except where the spaces are part of a substring surrounded by single quotes. I believe this will work:
/(?:[^ ']*(?:'[^']+')?[^ ']*)*/
but invite readers to subject it to careful scrutiny.
demo
This regex can be made self-documenting by writing it in free-spacing mode:
/
(?: # begin a non-capture group
[^ ']* # match 0+ chars other than spaces and single quotes
(?: # begin non-capture group
'[^']+' # match 1+ chars other than single quotes, surrounded
# by single quotes
)? # end non-capture group and make it optional
[^ ']* # match 0+ chars other than spaces and single quotes
)* # end non-capture group and execute it 0+ times
/x # free-spacing regex definition mode
This obviously will not work if there are nested single quotes.
@n.'pronouns'm. suggested an alternative regex that also works:
/([^ "']|'[^'"]*')*/
demo
回答3:
I found one possible (albeit ugly) solution in python (which also works with "
):
>>> import re
>>> foo = '''git log --format='(%h) %s' --foo="a b" --bar='c d' HEAD'''
>>> re.findall(r'''(\S*'[^']+'\S*|\S*"[^"]+"\S*|\S+)''', foo)
['git', 'log', "--format='(%h) %s'", '--foo="a b"', "--bar='c d'", 'HEAD']
来源:https://stackoverflow.com/questions/60502562/how-do-i-regex-split-by-space-avoiding-spaces-within-apostrophes