How to convert utf-8 fancy quotes to neutral quotes

前端 未结 2 428
面向向阳花
面向向阳花 2020-12-16 00:24

I\'m writing a little Python script that parses word docs and writes to a csv file. However, some of the docs have some utf-8 characters that my script can\'t process correc

相关标签:
2条回答
  • 2020-12-16 01:00

    You can use the Unidecode package to automatically convert all Unicode characters to their nearest pure ASCII equivalent.

    from unidecode import unidecode
    line = unidecode(line)
    

    This will handle both directions of double quotes as well as single quotes, em dashes, and other things that you probably haven't discovered yet.

    0 讨论(0)
  • 2020-12-16 01:01

    You can't assign to a string, as they are immutable, and can't be changed.

    You can, however, just use the regex library, which might be the most flexible way to do this:

    import re
    newline = re.sub(u'\u201c','"',line)
    
    0 讨论(0)
提交回复
热议问题