UnicodeEncodeError: 'gbk' codec can't encode character: illegal multibyte sequence

前端 未结 3 1674
伪装坚强ぢ
伪装坚强ぢ 2020-12-05 17:20

I want to get html content from a url and parse the html content with regular expression. But the html content has some multibyte characters. So I met the error described in

3条回答
  •  Happy的楠姐
    2020-12-05 17:50

    Combining the above answers, I found the following code works very well.

    import requests
    r = requests.get("https://www.example.com/").content
    str_content = r.decode('utf-8')
    fp = open("contents.txt","w", encoding='utf-8')
    fp.write(str_content)
    fp.close()
    

提交回复
热议问题