Strip special characters and punctuation from a unicode string

后端 未结 3 745
滥情空心
滥情空心 2021-01-27 13:26

I\'m trying to remove the punctuation from a unicode string, which may contain non-ascii letters. I tried using the regex module:

import regex
text          


        
3条回答
  •  庸人自扰
    2021-01-27 13:55

    \p{P} matches punctuation characters.

    Those punctuation characters are

    ! ' # S % & ' ( ) * + , - . / : ; < = > ? @ [ / ] ^ _ { | } ~
    

    < and > are not punctuation characters. So they won't be removed.

    Try this instead

    re.sub('[\p{L}<>]+',"",text)
    

提交回复
热议问题