character-encoding

Converting Exception to a string in Python 3

百般思念 提交于 2020-07-31 15:00:47
问题 does anyone have an idea, why this Python 3.2 code try: raise Exception('X') except Exception as e: print("Error {0}".format(str(e))) works without problem (apart of unicode encoding in windows shell :/), but this try: raise Exception('X') except Exception as e: print("Error {0}".format(str(e, encoding = 'utf-8'))) throws TypeError: coercing to str: need bytes, bytearray or buffer-like object, Exception found ? How to convert an Error to a string with custom encoding? Edit It does not works

How to encode a text stream into a byte stream in Python 3?

寵の児 提交于 2020-07-18 05:15:32
问题 Decoding a byte stream into a text stream is easy: import io f = io.TextIOWrapper(io.BytesIO(b'Test\nTest\n'), 'utf-8') f.readline() In this example, io.BytesIO(b'Test\nTest\n') is a byte stream and f is a text stream. I want to do exactly the opposite of that. Given a text stream or file-like object, I would like to encode it into a byte stream or file-like object without processing the entire stream . This is what I've tried so far: import io, codecs f = codecs.getreader('utf-8')(io

Weird leading characters utf-8/utf-16 encoding in Python

柔情痞子 提交于 2020-07-03 11:54:29
问题 I have written a simplified version to demonstrate the problem. I am encoding special characters in utf-8 and UTF-16 format. With utf-8 encoding there is no problem, when I am encoding with UTF-16 I get some weird leading characters. I tried to remove all trailing and leading characters but still the error persists. Sample of code: #!/usr/bin/env python2 # -*- coding: utf-8 -*- import chardet def myEncode(s, pattern): try: s.strip() u = unicode(s, pattern) print chardet.detect(u.encode

Weird leading characters utf-8/utf-16 encoding in Python

两盒软妹~` 提交于 2020-07-03 11:53:27
问题 I have written a simplified version to demonstrate the problem. I am encoding special characters in utf-8 and UTF-16 format. With utf-8 encoding there is no problem, when I am encoding with UTF-16 I get some weird leading characters. I tried to remove all trailing and leading characters but still the error persists. Sample of code: #!/usr/bin/env python2 # -*- coding: utf-8 -*- import chardet def myEncode(s, pattern): try: s.strip() u = unicode(s, pattern) print chardet.detect(u.encode

How does UTF-16 achieve self-synchronization?

梦想的初衷 提交于 2020-06-29 05:09:15
问题 I know that UTF-16 is a self-synchronizing encoding scheme. I also read the below Wiki, but did not quite get it. Self Synchronizing Code Can you please explain me with an example of UTF-16? 回答1: In UTF-16 characters outside of the BMP are represented using a surrogate pair in with the first code unit (CU) lies between 0xD800—0xDBFF and the second one between 0xDC00—0xDFFF. Each of the CU represents 10 bits of the code point. Characters in the BMP is encoded as itself. Now the synchronization

UnicodeDecodeError 'charmap' codec with Tesseract OCR in Python

喜你入骨 提交于 2020-06-27 18:18:12
问题 I am trying to do OCR on an image file in python using teseract-OCR. My environment is- Python 3.5 Anaconda on Windows Machine. Here is the code: from PIL import Image from pytesseract import image_to_string out = image_to_string(Image.open('sample.png')) The error I am getting is : File "Anaconda3\lib\sitepackages\pytesseract\pytesseract.py", line 167, in image_to_string return f.read().strip() File "Anaconda3\lib\encodings\cp1252.py", line 23 in decode return codecs.charmap_decode(input,

UnicodeDecodeError 'charmap' codec with Tesseract OCR in Python

▼魔方 西西 提交于 2020-06-27 18:14:39
问题 I am trying to do OCR on an image file in python using teseract-OCR. My environment is- Python 3.5 Anaconda on Windows Machine. Here is the code: from PIL import Image from pytesseract import image_to_string out = image_to_string(Image.open('sample.png')) The error I am getting is : File "Anaconda3\lib\sitepackages\pytesseract\pytesseract.py", line 167, in image_to_string return f.read().strip() File "Anaconda3\lib\encodings\cp1252.py", line 23 in decode return codecs.charmap_decode(input,

String Comparison, .NET and non breaking space

断了今生、忘了曾经 提交于 2020-06-24 11:18:25
问题 I have an app written in C# that does a lot of string comparison. The strings are pulled in from a variety of sources (including user input) and are then compared. However I'm running into problems when comparing space '32' to non-breaking space '160'. To the user they look the same and so they expect a match. But when the app does the compare, there is no match. What is the best way to go about this? Am I going to have to go to all parts of the code that do a string compare and manually

Postgres upper function on turkish character does not return expected result

旧时模样 提交于 2020-06-24 06:50:47
问题 It looks like postgres upper/lower function does not handle select characters in Turkish character set. select upper('Aaı'), lower('Aaİ') from mytable; returns : AAı, aaİ instead of : AAI, aai Note that normal english characters are converted correctly, but not the Turkish I (lower or upper) Postgres version: 9.2 32 bit Database encoding (Same result in any of these): UTF-8, WIN1254, C Client encoding: UTF-8, WIN1254, C OS: Windows 7 enterprise edition 64bit SQL functions lower and upper