efficiently replace bad characters

前端 未结 6 1224
梦毁少年i
梦毁少年i 2020-12-07 21:28

I often work with utf-8 text containing characters like:

\\xc2\\x99

\\xc2\\x95

\\xc2\\x85

etc

<
6条回答
  •  眼角桃花
    2020-12-07 21:51

    import unicodedata
    
    # Convert to unicode
    text_to_uncicode = unicode(text, "utf-8")           
    
    # Convert back to ascii
    text_fixed = unicodedata.normalize('NFKD',text_to_unicode).encode('ascii','ignore')         
    

提交回复
热议问题