character-encoding | 易学教程

Converting Exception to a string in Python 3

阅读更多关于 Converting Exception to a string in Python 3

问题 does anyone have an idea, why this Python 3.2 code try: raise Exception('X') except Exception as e: print("Error {0}".format(str(e))) works without problem (apart of unicode encoding in windows shell :/), but this try: raise Exception('X') except Exception as e: print("Error {0}".format(str(e, encoding = 'utf-8'))) throws TypeError: coercing to str: need bytes, bytearray or buffer-like object, Exception found ? How to convert an Error to a string with custom encoding? Edit It does not works

Why is the file name displayed as a question mark after connecting to ftp?

阅读更多关于 Why is the file name displayed as a question mark after connecting to ftp?

问题 If you use a cmd window to connect ftp and check the file name, the file name is displayed in a question mark. Why did you do that? 来源： https://stackoverflow.com/questions/62872262/why-is-the-file-name-displayed-as-a-question-mark-after-connecting-to-ftp

How to encode a text stream into a byte stream in Python 3?

阅读更多关于 How to encode a text stream into a byte stream in Python 3?

问题 Decoding a byte stream into a text stream is easy: import io f = io.TextIOWrapper(io.BytesIO(b'Test\nTest\n'), 'utf-8') f.readline() In this example, io.BytesIO(b'Test\nTest\n') is a byte stream and f is a text stream. I want to do exactly the opposite of that. Given a text stream or file-like object, I would like to encode it into a byte stream or file-like object without processing the entire stream . This is what I've tried so far: import io, codecs f = codecs.getreader('utf-8')(io

Weird leading characters utf-8/utf-16 encoding in Python

阅读更多关于 Weird leading characters utf-8/utf-16 encoding in Python

问题 I have written a simplified version to demonstrate the problem. I am encoding special characters in utf-8 and UTF-16 format. With utf-8 encoding there is no problem, when I am encoding with UTF-16 I get some weird leading characters. I tried to remove all trailing and leading characters but still the error persists. Sample of code: #!/usr/bin/env python2 # -*- coding: utf-8 -*- import chardet def myEncode(s, pattern): try: s.strip() u = unicode(s, pattern) print chardet.detect(u.encode

Weird leading characters utf-8/utf-16 encoding in Python

阅读更多关于 Weird leading characters utf-8/utf-16 encoding in Python

How does UTF-16 achieve self-synchronization?

阅读更多关于 How does UTF-16 achieve self-synchronization?

问题 I know that UTF-16 is a self-synchronizing encoding scheme. I also read the below Wiki, but did not quite get it. Self Synchronizing Code Can you please explain me with an example of UTF-16? 回答1: In UTF-16 characters outside of the BMP are represented using a surrogate pair in with the first code unit (CU) lies between 0xD800—0xDBFF and the second one between 0xDC00—0xDFFF. Each of the CU represents 10 bits of the code point. Characters in the BMP is encoded as itself. Now the synchronization

UnicodeDecodeError 'charmap' codec with Tesseract OCR in Python

阅读更多关于 UnicodeDecodeError 'charmap' codec with Tesseract OCR in Python

问题 I am trying to do OCR on an image file in python using teseract-OCR. My environment is- Python 3.5 Anaconda on Windows Machine. Here is the code: from PIL import Image from pytesseract import image_to_string out = image_to_string(Image.open('sample.png')) The error I am getting is : File "Anaconda3\lib\sitepackages\pytesseract\pytesseract.py", line 167, in image_to_string return f.read().strip() File "Anaconda3\lib\encodings\cp1252.py", line 23 in decode return codecs.charmap_decode(input,

UnicodeDecodeError 'charmap' codec with Tesseract OCR in Python

阅读更多关于 UnicodeDecodeError 'charmap' codec with Tesseract OCR in Python

String Comparison, .NET and non breaking space

阅读更多关于 String Comparison, .NET and non breaking space

问题 I have an app written in C# that does a lot of string comparison. The strings are pulled in from a variety of sources (including user input) and are then compared. However I'm running into problems when comparing space '32' to non-breaking space '160'. To the user they look the same and so they expect a match. But when the app does the compare, there is no match. What is the best way to go about this? Am I going to have to go to all parts of the code that do a string compare and manually

Postgres upper function on turkish character does not return expected result

阅读更多关于 Postgres upper function on turkish character does not return expected result

问题 It looks like postgres upper/lower function does not handle select characters in Turkish character set. select upper('Aaı'), lower('Aaİ') from mytable; returns : AAı, aaİ instead of : AAI, aai Note that normal english characters are converted correctly, but not the Turkish I (lower or upper) Postgres version: 9.2 32 bit Database encoding (Same result in any of these): UTF-8, WIN1254, C Client encoding: UTF-8, WIN1254, C OS: Windows 7 enterprise edition 64bit SQL functions lower and upper