Whenever I try to read UTF-8 encoded text files, using open(file_name, encoding=\'utf-8\')
, I always get an error saying ASCII codec can\'t decode some characte
I had a similar problem. For me, initially the environtment variable LANG
was not set (you can check this by running env
)
$ python3 -c 'import locale; print(locale.getdefaultlocale())'
(None, None)
$ python3 -c 'import locale; print(locale.getpreferredencoding())'
ANSI_X3.4-1968
The available locales for me was (on a fresh Ubuntu 18.04 Docker image):
$ locale -a
C
C.UTF-8
POSIX
So i picked the utf-8 one:
$ export LANG="C.UTF-8"
And then things work
$ python3 -c 'import locale; print(locale.getdefaultlocale())'
('en_US', 'UTF-8')
$ python3 -c 'import locale; print(locale.getpreferredencoding())'
UTF-8
If you pick a locale that is not avaiable, such as
export LANG="en_US.UTF-8"
it will not work:
$ python3 -c 'import locale; print(locale.getdefaultlocale())'
('en_US', 'UTF-8')
$ python3 -c 'import locale; print(locale.getpreferredencoding())'
ANSI_X3.4-1968
and this is why locale
is giving the error messages:
locale: Cannot set LC_CTYPE to default locale: No such file or directory
locale: Cannot set LC_ALL to default locale: No such file or directory