encoding | 易学教程

Filtering text encoded with utf-8 to only contain latin alphabet characters

阅读更多关于 Filtering text encoded with utf-8 to only contain latin alphabet characters

问题 I'm trying to filter textdata to only contain latin characters, for further text analyzing. The original textsource most likely contained Korean Alphabet. This shows up like this in the text file: \xe7\xac\xac8\xe4\xbd\x8d ONE PIECE FILM GOLD Blu-ray GOLDEN LIMITED EDITION What would be the fastest/easiest/most complete way to get remove these? I tried making a script that would remove all \xXX combinations, but it turns out that there are to many exceptions for this to be reliable. Is there

Conversion from javascript-escaped Unicode to Java Unicode

阅读更多关于 Conversion from javascript-escaped Unicode to Java Unicode

问题 I have a query string passed in through an HTTP request that has this character in it: %u54E6 And I'd like to generate a string that contains the actual Chinese character so I can use it in a different part of the application, I've tried using this code: String foo = "%u54E6"; String ufoo = new String(foo.replaceAll("%u([a-zA-Z0-9]{4})", "\\" + "u$1")); System.out.println("ufoo: " + ufoo); Unfortunately, all I'm getting is 'u54E6' printed to the console for the value, instead of the Chinese

AS3 PNG Encoder?

阅读更多关于 AS3 PNG Encoder?

问题 Is there a way to encode bitmap data into PNG for use with a FileReference.Save() method in AS3? I assume I need an encoder library. Where can I get a library that encode BitpmapData into PNG? 回答1: You're looking for: https://github.com/mikechambers/as3corelib There are probably other solutions, but this is widely used and well tested. I've used it myself on several occasions without issue. 回答2: Shameless plug! I just wrote a new PNG encoder called PNGEncoder2. It's extremely fast, offers

How to handle when some special UTF-8 characters are inside a XML file in matlab

阅读更多关于 How to handle when some special UTF-8 characters are inside a XML file in matlab

问题 I have several xml files to process. sample file is given below <DOC> <DOCNO>2431.eng</DOCNO> <TITLE>The Hot Springs of Baños del Inca near Cajamarca</TITLE> <DESCRIPTION>view of several pools with steaming water; people, houses and trees behind it, and a mountain range in the distant background;</DESCRIPTION> <NOTES>Until 1532 the place was called Pulltumarca, before it was renamed to "Baños del Inca" (baths of the Inka) with the arrival of the Spaniards . Today, Baños del Inca is the most

Encoding.Default not working in Unity Player

阅读更多关于 Encoding.Default not working in Unity Player

问题 I recently went thru a small issue with my app, which basically downloads a string and an image from my website, now it works perfectly in the Unity Editor, but when I build and run, it doesn't print the image properly anymore! Here is the important code byte[] data = sendMessage("https://mywebsite.com", values); string ret = Encoding.Default.GetString(data); var split = ret.Split(new string[] {"//"}, StringSplitOptions.None); byte[] texdata = Encoding.Default.GetBytes(split[1]); varstr =

Encoding in Pig

阅读更多关于 Encoding in Pig

问题 Loading data that contains some particular characters (as for example, À, ° and others) using Pig Latin and storing data in a .txt file is possible to see that these symbols in a txt file are displayed as ï¿½ and ï characters. That happens because of UTF-8 substitution character. I would like to ask if is possible to avoid it somehow, maybe with some pig commands, to have in the result (in txt file) for example À instead of ï¿½? 回答1: In Pig we have built in dynamic invokers that that allow a

Wrong Charset Encoding with Play Framework 2.1

阅读更多关于 Wrong Charset Encoding with Play Framework 2.1

问题 I have a web service that receives a parameter in ISO-8859-1 encoding. But when I try to read it from the request, I get this characters: �� I've tryied all these approaches, but none of the convert the given string to the expected one (áéíóú): val a = new String(_html.getBytes()); val b = new String(_html.getBytes(), "UTF-8") val c = new String(_html.getBytes(), "ISO-8859-1") val d = new String(_html.getBytes("ISO-8859-1"), "UTF-8") val e = new String(_html.getBytes("ISO-8859-1"), "ISO

How to identify encoding from hex values?

阅读更多关于 How to identify encoding from hex values?

问题 I have text on a website that displays like that: o¨ instead of ö I extracted the text out of the CMS and analysed it's hex values: the ö's that are displays correctly have c3 b6 - UTF-8 the ö's that are displayed incorrect have 6f cc 88 I couldn't find out what encoding this is. What's a good way to identify the encoding? 回答1: 6F is the UTF-8 (ASCII) encoding of "o", nothing spectacular. CC 88 is the UTF-8 encoding of U+0308, COMBINING DIAERESIS. You're simply looking at the decomposed form

How to fix ActionView::Template::Error (incompatible character encodings: ASCII-8BIT and UTF-8)

阅读更多关于 How to fix ActionView::Template::Error (incompatible character encodings: ASCII-8BIT and UTF-8)

问题 I am getting the following error [ActionView::Template::Error (incompatible character encodings: ASCII-8BIT and UTF-8)] Here is the log... Completed 500 Internal Server Error in 318ms Jan 09 23:29:19 burro app/web.1: ActionView::Template::Error (incompatible character encodings: ASCII-8BIT and UTF-8): Jan 09 23:29:19 burro app/web.1: 97:  Jan 09 23:29:19 burro app/web.1: 98: </tr> Jan 09 23:29:19 burro app/web.1: 99: <% end %> Jan

DocumentBuilder parse produces invalid byte 2 of 4-byte UTF-8 sequence error

阅读更多关于 DocumentBuilder parse produces invalid byte 2 of 4-byte UTF-8 sequence error

问题 I am trying to parse a bytearray which contains a string Impresión in XML final DocumentBuilderFactory builderFactory = DocumentBuilderFactory.newInstance(); final DocumentBuilder builder = builderFactory.newDocumentBuilder(); final Document document; try (final InputStream stream = new ByteArrayInputStream(bytearray)) { document = builder.parse(stream); // gives Invalid byte 2 of 4-byte UTF-8 sequence error } It produces Invalid byte 2 of 4-byte UTF-8 sequence error. But when i have Unicode