UnicodeEncodeError: 'gbk' codec can't encode character: illegal multibyte sequence

前端未结

关注

 3  1674

伪装坚强ぢ 2020-12-05 17:20

I want to get html content from a url and parse the html content with regular expression. But the html content has some multibyte characters. So I met the error described in

3条回答

Happy的楠姐 (楼主)

2020-12-05 17:50

Combining the above answers, I found the following code works very well.

import requests
r = requests.get("https://www.example.com/").content
str_content = r.decode('utf-8')
fp = open("contents.txt","w", encoding='utf-8')
fp.write(str_content)
fp.close()

0 讨论(0)

查看其它3个回答