After we found the answer to this question we are faced with next unusual replacement behavior:
Our regex is:
[\\((\\[{【]+(\\w+|\\s+|\\S+|\\W+)?[)\\)\\]}】]+
We are trying to match all content inside any type of brackets including the brackets The original text is:
物理化学名校考研真题详解 (理工科考研辅导系列(化学生物类))
The result is:
The code for the replacement is:
delimiter = ' ' if localization == 'CN': delimiter = '' p = re.compile(codecs.encode(unicode(regex), "utf-8"), flags=re.I) columnString = (p.sub(delimiter, columnString).strip()
Same problem we are faced when we used regex:
(\\d*[满|元]) print repr(columnString)='\xe5\xbd\x93\xe4\xbb\xa3\xe9\xaa\xa8\xe4\xbc\xa4\xe7\xa7\x91\xe5\xa6\x99\xe6\x96\xb9(\xe7\xac\xac\xe5\x9b\x9b\xe7\x89\x88)' print repr(regex)=u'[\\(\uff08\\[{\u3010]+(\\w+|\\s+|\\S+|\\W+)?[\uff09\\)\\]}\u3011]+' print repr(p.pattern)='[\\(\xef\xbc\x88\\[{\xe3\x80\x90]+(\\w+|\\s+|\\S+|\\W+)?[\xef\xbc\x89\\)\\]}\xe3\x80\x91]+'