utf

Why is sys.getdefaultencoding() different from sys.stdout.encoding and how does this break Unicode strings?

一曲冷凌霜 提交于 2019-11-30 09:44:57
I spent a few angry hours looking for the problem with Unicode strings that was broken down to something that Python (2.7) hides from me and I still don't understand. First, I tried to use u".." strings consistently in my code, but that resulted in the infamous UnicodeEncodeError . I tried using .encode('utf8') , but that didn't help either. Finally, it turned out I shouldn't use either and it all works out automagically. However, I (here I need to give credit to a friend who helped me) did notice something weird while banging my head against the wall. sys.getdefaultencoding() returns ascii ,

Reading php generated XML in flash?

蹲街弑〆低调 提交于 2019-11-30 09:33:18
问题 Here is part 1 of our problem (Loading a dynamically generated XML file as PHP in Flash). Now we were able to get Flash to read the XML file, but we can only see the Flash render correctly when tested(test movie) from the actual Flash program. However, when we upload our files online to preview the Flash does not render correctly, missing some vital information(thumbnails, titles, video etc..). Additional information: The SWF file exists on Domain 1 The XML & PHP file both exists on Domain 2

Reading php generated XML in flash?

对着背影说爱祢 提交于 2019-11-29 15:54:46
Here is part 1 of our problem (Loading a dynamically generated XML file as PHP in Flash) . Now we were able to get Flash to read the XML file, but we can only see the Flash render correctly when tested(test movie) from the actual Flash program. However, when we upload our files online to preview the Flash does not render correctly, missing some vital information(thumbnails, titles, video etc..). Additional information: The SWF file exists on Domain 1 The XML & PHP file both exists on Domain 2 And the HTML file with the embed code lies on Domain 3 Wondering if this could be a crossdomain issue?

SQL doesnt differentiate u and ü although collation is utf8mb4_unicode_ci

核能气质少年 提交于 2019-11-29 10:05:49
In a table x , there is a column with the values u and ü . SELECT * FROM x WHERE column='u' . This returns u AND ü , although I am only looking for the u . The table's collation is utf8mb4_unicode_ci . Wherever I read about similar problems, everyone suggests to use this collation because they say that utf8mb4 really covers ALL CHARACTERS. With this collation, all character set and collation problems should be solved. I can insert ü , è , é , à , Chinese characters , etc. When I make a SELECT * , they are also retrieved and displayed correctly. The problem only occurs when I COMPARE two

UTF Encoding for Chinese CharactersJava

六月ゝ 毕业季﹏ 提交于 2019-11-29 04:28:41
I am receiving a String via an object from an axis webservice. Because I'm not getting the string I expected, I did a check by converting the string into bytes and I get C3A4C2 BDC2A0 C3A5C2 A5C2BD C3A5C2 90C297 in hexa, when I'm expecting E4BDA0 E5A5BD E59097 which is actually 你好吗 in UTF-8. Any ideas what might be causing 你好吗 to become C3A4C2 BDC2A0 C3A5C2 A5C2BD C3A5C2 90C297? I did a Google search but all I got was a chinese website describing a problem that happens in python. Any insights will be great, thanks! You have what is known as a double encoding. You have the three character

Is there a field in which PDF files specify their encoding?

别说谁变了你拦得住时间么 提交于 2019-11-29 03:19:58
I understand that it is impossible to determine the character encoding of any stringform data just by looking at the data. This is not my question. My question is: Is there a field in a PDF file where, by convention, the encoding scheme is specified (e.g.: UTF-8)? This would be something roughly analogous to <html> <head> <meta http-equiv="Content-Type" content="text/html; charset=utf-8"> in HTML. Thank you very much in advance, Blz A quick look at the PDF specification seems to suggest that you can have different encoding inside a PDF-file. Have a look at page 86. So a PDF library with some

ISO-8859-1 vs UTF-8?

做~自己de王妃 提交于 2019-11-28 16:30:23
What should be used and when ? or is it always better to use UTF-8 always? or ISO-8859-1 still has importance in specific conditions? Is Character-set related to geographic region? Edit: Is there any benefit to put this code @charset "utf-8"; or like this <link type="text/css; charset=utf-8" rel="stylesheet" href=".." /> at the top of CSS file? I found for this If DreamWeaver adds the tag when you add embedded style to the document, that is a bug in DreamWeaver. From the W3C FAQ: "For style declarations embedded in a document, @charset rules are not needed and must not be used." The charset

Invalid URI with Chinese characters (Java)

有些话、适合烂在心里 提交于 2019-11-28 12:44:41
Having trouble setting up a URL connection with Chinese characters in the URL. It works with Latin characters: String xstr = "维也纳恩斯特哈佩尔球场" ; URI uri = new URI("http","ajax.googleapis.com","/ajax/services/language/detect","v=1.0&q="+xstr,null); URL url = uri.toURL(); URLConnection connection = url.openConnection(); InputStream is = connection.getInputStream() ; The getInputStream() call results in: java.lang.IllegalArgumentException: Invalid uri 'http://ajax.googleapis.com/ajax/services/language/detect?v=1.0&q=???????????': Invalid query The problem is caused by the fact that URI.toURL() doesn

What most correct way to set the encoding in C++?

那年仲夏 提交于 2019-11-28 12:10:36
How it is best of all to set the encoding in C++? I got used to working with Unicode (and wchar_t , wstring , wcin , wcout and L" ... "). I also save source in UTF-8. At the moment I use MinGW (Windows 7) and run my program in Windows console (cmd.exe), but sometimes I can use gcc on GNU\Linux and run promgram in Linux console with UTF-8 encoding. At all times I want to compile my source on Windows and on Linux and I want that all Unicode symbols were correctly inputed and outputed. When I faced the next problem with encodings, I googled. Also I found the most different councils: setlocale(LC

Is there a way in ruby 1.9 to remove invalid byte sequences from strings?

ε祈祈猫儿з 提交于 2019-11-28 07:19:59
Suppose you have a string like "€foo\xA0" , encoded UTF-8, Is there a way to remove invalid byte sequences from this string? ( so you get "€foo" ) In ruby-1.8 you could use Iconv.iconv('UTF-8//IGNORE', 'UTF-8', "€foo\xA0") but that is now deprecated. "€foo\xA0".encode('UTF-8') doesn't do anything, since it is already UTF-8. I tried: "€foo\xA0".force_encoding('BINARY').encode('UTF-8', :undef => :replace, :replace => '') which yields "foo" But that also loses the valid multibyte character € Evgenii "€foo\xA0".chars.select(&:valid_encoding?).join Van der Hoorn "€foo\xA0".encode('UTF-16le',