UnicodeEncodeError: 'charmap' codec can't encode characters

江枫思渺然 提交于 2019-11-26 01:35:44

问题


I\'m trying to scrape a website, but it gives me an error.

I\'m using the following code:

import urllib.request
from bs4 import BeautifulSoup

get = urllib.request.urlopen(\"https://www.website.com/\")
html = get.read()

soup = BeautifulSoup(html)

print(soup)

And I\'m getting the following error:

File \"C:\\Python34\\lib\\encodings\\cp1252.py\", line 19, in encode
    return codecs.charmap_encode(input,self.errors,encoding_table)[0]
UnicodeEncodeError: \'charmap\' codec can\'t encode characters in position 70924-70950: character maps to <undefined>

What can I do to fix this?


回答1:


I was getting the same UnicodeEncodeError when saving scraped web content to a file. To fix it I replaced this code:

with open(fname, "w") as f:
    f.write(html)

with this:

import io
with io.open(fname, "w", encoding="utf-8") as f:
    f.write(html)

Using io gives you backward compatibility with Python 2.

If you only need to support Python 3 you can use the builtin open function instead:

with open(fname, "w", encoding="utf-8") as f:
    f.write(html)



回答2:


I fixed it by adding .encode("utf-8") to soup.

That means that print(soup) becomes print(soup.encode("utf-8")).




回答3:


In Python 3.7, and running Windows 10 this worked (I am not sure whether it will work on other platforms and/or other versions of Python)

Replacing this line:

with open('filename', 'w') as f:

With this:

with open('filename', 'w', encoding='utf-8') as f:

The reason why it is working is because the encoding is changed to UTF-8 when using the file, so characters in UTF-8 are able to be converted to text, instead of returning an error when it encounters a UTF-8 character that is not suppord by the current encoding.




回答4:


While saving the response of get request, same error was thrown on Python 3.7 on window 10. The response received from the URL, encoding was UTF-8 so it is always recommended to check the encoding so same can be passed to avoid such trivial issue as it really kills lots of time in production

import requests
resp = requests.get('https://en.wikipedia.org/wiki/NIFTY_50')
print(resp.encoding)
with open ('NiftyList.txt', 'w') as f:
    f.write(resp.text)

When I added encoding="utf-8" with the open command it saved the file with the correct response

with open ('NiftyList.txt', 'w', encoding="utf-8") as f:
    f.write(resp.text)



回答5:


Even I faced the same issue with the encoding that occurs when you try to print it, read/write it or open it. As others mentioned above adding .encoding="utf-8" will help if you are trying to print it.

soup.encode("utf-8")

If you are trying to open scraped data and maybe write it into a file, then open the file with (......,encoding="utf-8")

with open(filename_csv , 'w', newline='',encoding="utf-8") as csv_file:




回答6:


For those still getting this error, adding encode("utf-8") to soup will also fix this.

soup = BeautifulSoup(html_doc, 'html.parser').encode("utf-8")
print(soup)


来源:https://stackoverflow.com/questions/27092833/unicodeencodeerror-charmap-codec-cant-encode-characters

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!