问题
I want to substitute all space characters (except \n) with "". I tried using regular expression with \s+ but it matches with newline character as well.
Is there any method to skip \n in \s in regex?
回答1:
If you do not have to think of Unicode, you could use
[ \t\r\f\v]
Or, since \v matches a VT (verical symbol, \x0b), \r is also considered a line break, and \f is also a kind of a vertical whitespace (rather obsolete now though - (form feed, \x0c):
[ \t]
See docs:
\s
When theUNICODEflag is not specified, it matches any whitespace character, this is equivalent to the set[ \t\n\r\f\v]. TheLOCALEflag has no extra effect on matching of the space. IfUNICODEis set, this will match the characters[ \t\n\r\f\v]plus whatever is classified as space in the Unicode character properties database.
If you need to support all Unicode spaces, use
\s(?<!\n)
This expression will match any whitespace that is not a line feed.
See the regex demo
Another example of how to add a restriction to a positive shorthand character class, is using its opposite inside a negated character class. \S is the opposite shorthand character class for \s, thus, we should put it into [^...] and *add the character from \s that we need to exclude:
[^\S\n]
Add \r, \v, etc. if you need to exclude all line breaks. The [^\S\n] matches any character other than a non-whitespace (=matches any whitespace) and a line feed character.
回答2:
It's said in the document that \s matches [ \t\n\r\f\v]. So you just need to replace '\s+' to [ \t\r\f\v]+ in order to skip \n.
回答3:
You can use the negated character class [^\S\n] where \S is all that is not a whitespace:
re.sub(r'[^\S\n]', '', s)
回答4:
\s matches [\r\n\t\f ], if you want only spaces you can use the following:
>>> re.sub(' ', '', 'test string\nwith new line')
Since ' ' matches a space (literally), this will remove all spaces but will keep the \n character.
回答5:
Is there any method to skip
\nin\sin regex?
You may use negative lookahead.
re.sub(r'(?!\n)\s', '', s)
If you also want to skip carriage return then add \r inside the negative lookahead.
re.sub(r'(?!\n|\r)\s', '', s)
It's like a kind of subtraction. ie, above regex would subtract \n, \r from \s
来源:https://stackoverflow.com/questions/35838690/escaping-n-in-s-match-in-reg-ex-python