I\'m trying to remove the punctuation from a unicode string, which may contain non-ascii letters. I tried using the regex module:
regex
import regex text
\p{P} matches punctuation characters.
\p{P}
Those punctuation characters are
! ' # S % & ' ( ) * + , - . / : ; < = > ? @ [ / ] ^ _ { | } ~
< and > are not punctuation characters. So they won't be removed.
<
>
Try this instead
re.sub('[\p{L}<>]+',"",text)