cp1252 | 易学教程

Python3 different behaviour between latin-1 and cp1252 when decoding unmapped characters

阅读更多关于 Python3 different behaviour between latin-1 and cp1252 when decoding unmapped characters

问题 I'm trying to read in Python3 a text file specifying encoding cp1252 which has unmapped characters (for instance byte 0x8d ). with open(inputfilename, mode='r', encoding='cp1252') as inputfile: print(inputfile.readlines()) I obviously get the following exception: Traceback (most recent call last): File "test.py", line 9, in <module> print(inputfile.readlines()) File "/usr/lib/python3.6/encodings/cp1252.py", line 23, in decode return codecs.charmap_decode(input,self.errors,decoding_table)[0]

Python3 different behaviour between latin-1 and cp1252 when decoding unmapped characters

阅读更多关于 Python3 different behaviour between latin-1 and cp1252 when decoding unmapped characters

Character encoding in Excel spreadsheet (and what Java charset to use to decode it)

阅读更多关于 Character encoding in Excel spreadsheet (and what Java charset to use to decode it)

问题 I am using the JExcel library to read excel spreadsheets. Each cell on the spreadsheet may contain localization strings in any of something like 44 languages (English, Portugese, French, Chinese, etc). Today I don't tell the API anything regarding the encoding its supposed to use. Its handling the Chinese OK, but it always screws up Portugese and German. Somehow the default encoding (MacRoman on my dev box, UTF-8 on production) is failing to properly interpret the strings it pulls out of the

Python 3 chokes on CP-1252/ANSI reading

阅读更多关于 Python 3 chokes on CP-1252/ANSI reading

问题 I'm working on a series of parsers where I get a bunch of tracebacks from my unit tests like: File "c:\Python31\lib\encodings\cp1252.py", line 23, in decode return codecs.charmap_decode(input,self.errors,decoding_table)[0] UnicodeDecodeError: 'charmap' codec can't decode byte 0x81 in position 112: character maps to <undefined> The files are opened with open() with no extra arguemnts. Can I pass extra arguments to open() or use something in the codec module to open these differently? This came

allow UTF-8 encoded filenames on (file-)webserver?

阅读更多关于 allow UTF-8 encoded filenames on (file-)webserver?

问题 I am hosting a small fileserver, where users can upload documents from all around the world. Due to problems in encoding (see otherquestion), I am asking myself if I should disallow users to upload (and on the other hand download) files not supported by CP1252 charset? or otherwise; is it senseful to allow users upload documents with arabian or chinese letters in their filenames? PS: they download the same file some time later (and it should have the same filename as uploaded) 回答1: You should

Can I install additional encodings in eclipse?

阅读更多关于 Can I install additional encodings in eclipse?

问题 I have the issue, that I need to use a subversion repository that was created (and is still in use) under windows. Thus the default encoding is CP-1252 . Now I want to checkout this repository in linux and alter files there using eclipse. I do not want to reencode the whole file using iconv to UTF-8 if possible as I do not know how good the programs under windows will behave. My first idea was to set the project encoding to CP-1252 . Under my installation (Kepler under linux) of eclipse there

Convert cp1252 to unicode in javascript

阅读更多关于 Convert cp1252 to unicode in javascript

问题 I need to convert cp125* 2 * text to unicode utf in javascript function. Function to convert CP125* 1 * to utf I already find. Please help me if you have this functionality, thanks! 回答1: If ISO-8859-1 is close enough, there is a special shortcut to convert ISO-8859-1-bytes-in-code-units to Unicode characters, due to the simple byte=code-point mapping: var chars= decodeURIComponent(escape(bytes)); For any other encoding there is no built-in functionality; you would have to include your own

Encoding cp-1252 as utf-8?

阅读更多关于 Encoding cp-1252 as utf-8?

问题 I am trying to write a Java app that will run on a linux server but that will process files generated on legacy Windows machines using cp-1252 as the character set. Is there anyway to encode these files as utf-8 instead of the cp-1252 it is generated as? 回答1: If the file names as well as content is a problem, the easiest way to solve the problem is setting the locale on the Linux machine to something based on ISO-8859-1 rather than UTF-8 . You can use locale -a to list available locales. For

How to deal with Non-ASCII Warning when performing Save on Python code edited with IDLE?

阅读更多关于 How to deal with Non-ASCII Warning when performing Save on Python code edited with IDLE?

问题 I frequently edit Python code using IDLE and occasionally when I perform a Save I receive an I/O Warning. I am assuming that I have inadvertently added a Non-ASCII character, and I do not really want to declare the cp1252 encoding. Is there an easy way to find and delete the Non-ASCII that the Warning relates to? The OS Version involved is Windows 7, and the Python version is 2.6.5 回答1: The regex [^ -~] will match anything except printing ASCII characters. It should be able to find your stray

python utf-8 encoding throws UnicodeDecodeError despite “errors = 'replace' ”

阅读更多关于 python utf-8 encoding throws UnicodeDecodeError despite “errors = 'replace' ”

问题 I'm trying to write out some text and encode it as utf-8 where possible, using the following code: outf.write((lang_name + "," + (script_name or "") + "\n").encode("utf-8", errors='replace')) I'm getting the following error: File "C:\Python27\lib\encodings\cp1252.py", line 15, in decode return codecs.charmap_decode(input,errors,decoding_table) UnicodeDecodeError: 'charmap' codec can't decode byte 0x81 in position 6: character maps to <undefined> I thought the errors='replace' part of my