character-encoding | 易学教程

PHP - detecting the user supplied character's char set

阅读更多关于 PHP - detecting the user supplied character's char set

问题 Is it possible to detect the user's string's char set? If not, how about the next question.. Are there reliable built-in PHP functions that can accurately tell if the user supplied string ( be it supplied thru get/post/cookie etc), are in a UTF-8 or not? In other words, can I do something like is_utf8($_GET['first_name']) Is there anyway this function could produce a TRUE where in reality the first_name was not in UTF-8? 回答1: Regarding 1: You can give mb_detect_encoding a try, but it's pretty

Bengali encoding

阅读更多关于 Bengali encoding

问题 Does anybody know what encoding is this (in Bengali language): Bs‡iRx eY©gvjvi cO_g eY© ‡fovi WvK, QvM‡ji e?v e?v WvK ‡Nvovi Mvwo; fvov‡U †gvUiMvwo as an example this web site seems to use it: http://www.shipbreakingbd.info It's using it's own font to represent the contents, it's just an example. I got the text file in this encoding which I need to convert to UTF-8. How can I do it? 回答1: This is a glyph-based encoding used in Mustafa Jabbar's Bengali font series published from Dhaka. But this

java.nio.charset.UnsupportedCharsetException: X-MAC-ROMAN in Jsoup getting a webpage

阅读更多关于 java.nio.charset.UnsupportedCharsetException: X-MAC-ROMAN in Jsoup getting a webpage

问题 I have Document document = Jsoup.connect(link).get(); and some times for some urls I get an exception: Exception in thread "main" java.nio.charset.UnsupportedCharsetException: X-MAC-ROMAN at java.nio.charset.Charset.forName(Unknown Source) at org.jsoup.helper.DataUtil.parseByteData(DataUtil.java:86) at org.jsoup.helper.HttpConnection$Response.parse(HttpConnection.java:469) at org.jsoup.helper.HttpConnection.get(HttpConnection.java:147) I have a catch block as: catch (IOException e1) I

Convert text to Latin encoding and decode back problem for Vietnamese

阅读更多关于 Convert text to Latin encoding and decode back problem for Vietnamese

问题 I'm trying to convert Vietnamese to Latin. It is a requirement to send the byte to ESC/P printer (see C# ESC/POS Print Vietnamese for reason why). But my question is very simple, look at this code: Encoding enc = Encoding.GetEncoding(1258); //vietnamese code page string content = "Cơm chiên với các loại gia vị truyền"; string newStr = Encoding.GetEncoding("Latin1").GetString(enc.GetBytes(content)); string origStr = enc.GetString(Encoding.GetEncoding("Latin1").GetBytes(newStr)); //origStr is

http_build_query putting strange chars in query string

阅读更多关于 http_build_query putting strange chars in query string

问题 I'm using cURL to submit a form and to do that I'm using PHP's http_build_query() to form a query string. I was wondering why the form didn't submit and then I echoed out the query string only to find a '¶' and a 'ð` in the query string. $post_data = array('terms' => 'true', 'ethnicity' => 0, 'param0' => 'Lance', 'param1' => 'Newman'); $post_data = http_build_query($post_data); curl_setopt($ch, CURLOPT_POST, true); curl_setopt($ch, CURLOPT_POSTFIELDS, $post_data); echo $post_data; Returns

Facelets charset problem

阅读更多关于 Facelets charset problem

问题 In my earlier post there was a problem with JSF charset handling, but also the other part of the problem was MySQL connection parameters for inserting data into db. The problem was solved. But, I migrated the same application from JSP to facelets and the same problem happened again. Characters from input fields are replaced when inserting to database (č is replaced with Ä), but data inserted into db from sql scripts with proper charset are displayed correctly. I'm still using registered

Copying text from Word to textarea

阅读更多关于 Copying text from Word to textarea

问题 A well-known problem: When copying text fro MS Word into a textarea, the text's characters get converted to strange characters when saving it to database. I was wondering how I should solve this? Character encoding of the HTML document that holds the form A before save method to sanitize the data Sanitization after retrieving the data (before displaying) A database configuration (character encoding for table) I would prefer if 1. worked, but any other solution will do. 回答1: The solution:

Curl: get UTF-8 data from site with incorrect charset

阅读更多关于 Curl: get UTF-8 data from site with incorrect charset

问题 I scrape some sites that occasionally have UTF-8 characters in the title, but that don't specify UTF-8 as the charset (qq.com is an example). When I use look at the website in my browser, the data I want to copy (i.e. the title) looks correct (Japanese or Chinese..not too sure). I can copy the title and paste it into the terminal and it looks exactly the same. I can even write it to the DB and when I retrieve from the DB it still looks the same, and correct. However, when I use cURL, the data

Python Character Encoding European Accents

阅读更多关于 Python Character Encoding European Accents

问题 I know this is not an uncommon problem and that there are already multiple SO questions answered about this (1, 2, 3) but even in following the recommendations there, I am still seeing this error (for the below code): uri_name = u"%s_%s" % (name[1].encode('utf-8').strip(), name[0].encode('utf-8').strip()) UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 4: ordinal not in range(128) So I am trying to get a url from a list of artist names, a lot of which have accents and

Python Character Encoding European Accents

阅读更多关于 Python Character Encoding European Accents