How to convert utf-8 fancy quotes to neutral quotes

前端未结

关注

 2  428

I\'m writing a little Python script that parses word docs and writes to a csv file. However, some of the docs have some utf-8 characters that my script can\'t process correc

相关标签:

2条回答

野性不改

2020-12-16 01:00
You can use the Unidecode package to automatically convert all Unicode characters to their nearest pure ASCII equivalent.
```
from unidecode import unidecode
line = unidecode(line)
```
This will handle both directions of double quotes as well as single quotes, em dashes, and other things that you probably haven't discovered yet.
0 讨论(0)
发布评论:

提交评论
- 加载中...
野性不改

2020-12-16 01:01
You can't assign to a string, as they are immutable, and can't be changed.

You can, however, just use the regex library, which might be the most flexible way to do this:
```
import re
newline = re.sub(u'\u201c','"',line)
```
0 讨论(0)
发布评论:

提交评论
- 加载中...