cp1251

ITextSharp: parse html with cyrillic/international words

萝らか妹 提交于 2020-01-06 02:17:08
问题 I try to parse html file and to generate pdf. I use code document.Open(); HtmlPipelineContext htmlContext = new HtmlPipelineContext(null); htmlContext.SetTagFactory(Tags.GetHtmlTagProcessorFactory()); ICSSResolver cssResolver = XMLWorkerHelper.GetInstance().GetDefaultCssResolver(true); IPipeline pipeline = new CssResolverPipeline(cssResolver, new HtmlPipeline(htmlContext, new PdfWriterPipeline(document, writer))); XMLWorker worker = new XMLWorker(pipeline, true); XMLParser p = new XMLParser

UnicodeDecodeError in PyCharm debugger

旧时模样 提交于 2019-12-22 05:06:44
问题 Its a reference to UnicodeDecodeError while using cyryllic . I have same problem with Python 3.3 and Pycharm 2.7.2 Tryed to hardcode encoding in code, manually specifying encoding in Pycharm options, but no effect. It still tries to open utf-8 file with cp1251 lib. Connected to pydev debugger (build 129.314) Traceback (most recent call last): File "C:\Program Files (x86)\JetBrains\PyCharm 2.7.2\helpers\pydev\pydevd.py", line 1481, in <module> debugger.run(setup['file'], None, None) File "C:

How can I convert a cp1251 byte array to a utf8 String?

无人久伴 提交于 2019-12-19 10:26:47
问题 We don't have the cp1251 code page available on a phone, so new String( data, "cp1251" ) doesn't work. We need a function with signature something like String ArrayCp1251toUTF8String(byte data[]); 回答1: First, there's no such thing as a "UTF-8 string" in Java, they're just strings. But you don't need to worry about the string's encoding, just the encoding of the bytes you're converting. Since cp1251 (or windows-1251 ) is a single-byte encoding, decoding is a simple matter of using the byte

Decoding a url-encoded windows-1251 (cp1251) string with JavaScript

你说的曾经没有我的故事 提交于 2019-12-13 00:09:27
问题 I have faced a problem, unfortunately, I have not found a correct solution: I need to decode url-slice that is encoded with windows-1251 (cp1251). I know there are theese methods - decodeURI() and decodeURIComponent() , but they work for UTF-8 only (as I have understood). A solution that I found uses deprecated methods escape() and unescape(). For example, there is sequence: %EF%F0%EE%E3%F0%E0%EC%EC%E8%F0%EE%E2%E0%ED%E8%E5 (программирование) The methods decodeURI() and decodeURIComponent()

The proper way of encoding detection in perl

你说的曾经没有我的故事 提交于 2019-12-07 09:25:02
问题 I've got these two strings: %EC%E0%EC%E0+%EC%FB%EB%E0+%F0%E0%EC%F3 %D0%BC%D0%B0%D0%BC%D0%B0%20%D0%BC%D1%8B%D0%BB%D0%B0%20%D1%80%D0%B0%D0%BC%D1%83 This is a url-encoded phrase in Russian in cp-1251 and utf-8 respectively. I want to see them in Russian in my utf-8 terminal using perl. Unfortunately, perl module Encode::Detect (after url-decoding) can't detect cp-1251 of the first example. Instead, it proposes this: "x-euc-tw". The question is, what is the proper way of detecting the right

Python unicode behaviour in Google App Engine

大兔子大兔子 提交于 2019-12-06 14:54:23
问题 I got completely confused with gae. I have a script, that does a post request(using urlfetch from Google App Engine api) as a response we get a cp1251 encoded html page. Then I decode it, using .decode('cp1251') and parse with lxml. My code works totally fine on my local machine: import re import leaf #simple wrapper for lxml weekdaysD={u'понедельник':1, u'вторник':2, u'среда':3, u'четверг':4, u'пятница':5, u'суббота':6} document = leaf.parse(leaf.strip_symbols(leaf.strip_accents(html_in

UnicodeDecodeError in PyCharm debugger

天大地大妈咪最大 提交于 2019-12-05 04:41:54
Its a reference to UnicodeDecodeError while using cyryllic . I have same problem with Python 3.3 and Pycharm 2.7.2 Tryed to hardcode encoding in code, manually specifying encoding in Pycharm options, but no effect. It still tries to open utf-8 file with cp1251 lib. Connected to pydev debugger (build 129.314) Traceback (most recent call last): File "C:\Program Files (x86)\JetBrains\PyCharm 2.7.2\helpers\pydev\pydevd.py", line 1481, in <module> debugger.run(setup['file'], None, None) File "C:\Program Files (x86)\JetBrains\PyCharm 2.7.2\helpers\pydev\pydevd.py", line 1124, in run pydev_imports

How can I convert a cp1251 byte array to a utf8 String?

半城伤御伤魂 提交于 2019-12-01 10:54:03
We don't have the cp1251 code page available on a phone, so new String( data, "cp1251" ) doesn't work. We need a function with signature something like String ArrayCp1251toUTF8String(byte data[]); First, there's no such thing as a "UTF-8 string" in Java, they're just strings. But you don't need to worry about the string's encoding, just the encoding of the bytes you're converting. Since cp1251 (or windows-1251 ) is a single-byte encoding, decoding is a simple matter of using the byte value as an index into an appropriate array of char values. Here's an example: static String decodeCp1251(byte[

How to convert a string from CP-1251 to UTF-8?

爱⌒轻易说出口 提交于 2019-11-30 12:46:24
问题 I'm using mutagen to convert ID3 tags data from CP-1251/CP-1252 to UTF-8. In Linux there is no problem. But on Windows, calling SetValue() on a wx.TextCtrl produces the error: UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 0: ordinal not in range(128) The original string (assumed to be CP-1251 encoded) that I'm pulling from mutagen is: u'\xc1\xe5\xeb\xe0\xff \xff\xe1\xeb\xfb\xed\xff \xe3\xf0\xee\xec\xf3' I've tried converting this to UTF-8: dd = d.decode('utf-8') ...and

How to convert a string from CP-1251 to UTF-8?

旧城冷巷雨未停 提交于 2019-11-30 03:04:05
I'm using mutagen to convert ID3 tags data from CP-1251 / CP-1252 to UTF-8. In Linux there is no problem. But on Windows, calling SetValue() on a wx.TextCtrl produces the error: UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 0: ordinal not in range(128) The original string (assumed to be CP-1251 encoded) that I'm pulling from mutagen is: u'\xc1\xe5\xeb\xe0\xff \xff\xe1\xeb\xfb\xed\xff \xe3\xf0\xee\xec\xf3' I've tried converting this to UTF-8: dd = d.decode('utf-8') ...and even changing the default encoding from ASCII to UTF-8: sys.setdefaultencoding('utf-8') ...But I get