character-encoding

Java: Detect non-displayable chars for a given Character Encoding

ε祈祈猫儿з 提交于 2019-12-30 10:53:28
问题 I'm currently working on an application to validate and parse CSV-files. The CSV files have to be encoded in UTF-8, although sometimes we get files in a false encoding. The CSV-files most likely contain special characters of the German alphabet (Ä, Ö, Ü, ß) as most of the texts within the CSV file are in German language. For the part of the validator, i need to make sure, the file is UTF-8 encoded. As long as there are no special characters present, there is most likely no problem with

Java: Detect non-displayable chars for a given Character Encoding

梦想的初衷 提交于 2019-12-30 10:53:26
问题 I'm currently working on an application to validate and parse CSV-files. The CSV files have to be encoded in UTF-8, although sometimes we get files in a false encoding. The CSV-files most likely contain special characters of the German alphabet (Ä, Ö, Ü, ß) as most of the texts within the CSV file are in German language. For the part of the validator, i need to make sure, the file is UTF-8 encoded. As long as there are no special characters present, there is most likely no problem with

UTF-8 French accented characters issue

ぃ、小莉子 提交于 2019-12-30 09:37:22
问题 When i see data as stored on mysql database using phpmyadmin, the characters are stored exactly as é à ç however when i use php to display these data on an html document that has the exact following structure: <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> <html xmlns="http://www.w3.org/1999/xhtml"> <head> <meta http-equiv="Content-Type" content="text/html; charset=utf-8" /> <title></title> </head> <body> </body> <

Is '\u0B95' a multicharacter literal?

你离开我真会死。 提交于 2019-12-30 08:12:01
问题 In a previous answer I gave, I responded to the following warning being caused by the fact that '\u0B95' requires three bytes and so is a multicharacter literal : warning: multi-character character constant [-Wmultichar] But actually, I don't think I'm right and I don't think gcc is either. The standard states: An ordinary character literal that contains more than one c-char is a multicharacter literal . One production rule for c-char is a universal-character-name (i.e. \uXXXX or \UXXXXXXXX )

How to encode quotes in HTML body?

元气小坏坏 提交于 2019-12-30 08:09:11
问题 Should I encode quotes (such as " and ' -> ” and ’ ) in my HTML body (e.g. convert <p>Matt's Stuff</p> to <p>Matt’s Stuff</p> )? I was under the impression I should, but a co-worker said that it was no big deal. I'm dubious but I can't find anything that says it is forbidden. Am I mistaken? Is it a best-practice to encode? Or is it simply useless? 回答1: Encoding quotation marks (") is in practice only needed if the're inside an attribute, however for the HTML code to be correct (passing HTML

Illegal mix of collations in mySQL

空扰寡人 提交于 2019-12-30 08:06:07
问题 I need to transfer a column from one table to another. The source table has a different collation than the target table (latin1_general_ci and latin1_swedish_ci). I use UPDATE target LEFT JOIN source ON target.artnr = source.artnr SET target.barcode = source.barcode I get an "illegal mix of collations". What is a quick fix to get this working without having to change either table? I tried CONVERT and COLLATE to run the whole operation in UTF-8, but that didn't help. "barcode" contains numeric

How is this website fixing the encoding?

倖福魔咒の 提交于 2019-12-30 07:42:38
问题 I am trying to turn this text: ×וויר. העתיד של רשתות חברתיות והתקשורת ×©×œ× ×• Into this text: אוויר. העתיד של רשתות חברתיות והתקשורת שלנו Somehow, this website: http://www.pixiesoft.com/flip/ Can do it, and I would like to know how I might be able to do it myself (with whatever programming language or software) Just saving the file as UTF8 won't do it. My motivation for this question is that I have a friend's exported XML file with the garbled text which I want

How is this website fixing the encoding?

左心房为你撑大大i 提交于 2019-12-30 07:41:56
问题 I am trying to turn this text: ×וויר. העתיד של רשתות חברתיות והתקשורת ×©×œ× ×• Into this text: אוויר. העתיד של רשתות חברתיות והתקשורת שלנו Somehow, this website: http://www.pixiesoft.com/flip/ Can do it, and I would like to know how I might be able to do it myself (with whatever programming language or software) Just saving the file as UTF8 won't do it. My motivation for this question is that I have a friend's exported XML file with the garbled text which I want

Cannot store UTF8 characters in MySQL

泪湿孤枕 提交于 2019-12-30 07:24:27
问题 Cannot find the reason why I am unable to store in a MySQL database characters like ţ, î, ş. My table definition is: CREATE TABLE IF NOT EXISTS `gen_admin_words_translated` ( `id` int(10) NOT NULL AUTO_INCREMENT, `word_id` int(10) NOT NULL, `value` text COLLATE utf8_unicode_ci, `lang_id` int(2) NOT NULL, `needUpd` int(1) NOT NULL DEFAULT '1', PRIMARY KEY (`id`) ) ENGINE=MyISAM DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci AUTO_INCREMENT=2689 ; The connection to the database is done with the

Adding encoding alias to python

跟風遠走 提交于 2019-12-30 06:59:27
问题 Is there a way that I can add alias to python for encoding. There are sites on the web that are using the encoding 'windows-1251' but have their charset set to win-1251, so I would like to have win-1251 be an alias to windows-1251 回答1: The encodings module is not well documented so I'd instead use codecs , which is: import codecs def encalias(oldname, newname): old = codecs.lookup(oldname) new = codecs.CodecInfo(old.encode, old.decode, streamreader=old.streamreader, streamwriter=old