encoding

Filtering text encoded with utf-8 to only contain latin alphabet characters

随声附和 提交于 2020-01-05 05:11:08
问题 I'm trying to filter textdata to only contain latin characters, for further text analyzing. The original textsource most likely contained Korean Alphabet. This shows up like this in the text file: \xe7\xac\xac8\xe4\xbd\x8d ONE PIECE FILM GOLD Blu-ray GOLDEN LIMITED EDITION What would be the fastest/easiest/most complete way to get remove these? I tried making a script that would remove all \xXX combinations, but it turns out that there are to many exceptions for this to be reliable. Is there

Conversion from javascript-escaped Unicode to Java Unicode

无人久伴 提交于 2020-01-05 03:36:30
问题 I have a query string passed in through an HTTP request that has this character in it: %u54E6 And I'd like to generate a string that contains the actual Chinese character so I can use it in a different part of the application, I've tried using this code: String foo = "%u54E6"; String ufoo = new String(foo.replaceAll("%u([a-zA-Z0-9]{4})", "\\" + "u$1")); System.out.println("ufoo: " + ufoo); Unfortunately, all I'm getting is 'u54E6' printed to the console for the value, instead of the Chinese

AS3 PNG Encoder?

[亡魂溺海] 提交于 2020-01-05 03:03:37
问题 Is there a way to encode bitmap data into PNG for use with a FileReference.Save() method in AS3? I assume I need an encoder library. Where can I get a library that encode BitpmapData into PNG? 回答1: You're looking for: https://github.com/mikechambers/as3corelib There are probably other solutions, but this is widely used and well tested. I've used it myself on several occasions without issue. 回答2: Shameless plug! I just wrote a new PNG encoder called PNGEncoder2. It's extremely fast, offers

How to handle when some special UTF-8 characters are inside a XML file in matlab

≡放荡痞女 提交于 2020-01-05 02:24:07
问题 I have several xml files to process. sample file is given below <DOC> <DOCNO>2431.eng</DOCNO> <TITLE>The Hot Springs of Baños del Inca near Cajamarca</TITLE> <DESCRIPTION>view of several pools with steaming water; people, houses and trees behind it, and a mountain range in the distant background;</DESCRIPTION> <NOTES>Until 1532 the place was called Pulltumarca, before it was renamed to "Baños del Inca" (baths of the Inka) with the arrival of the Spaniards . Today, Baños del Inca is the most

Encoding.Default not working in Unity Player

不打扰是莪最后的温柔 提交于 2020-01-04 20:29:54
问题 I recently went thru a small issue with my app, which basically downloads a string and an image from my website, now it works perfectly in the Unity Editor, but when I build and run, it doesn't print the image properly anymore! Here is the important code byte[] data = sendMessage("https://mywebsite.com", values); string ret = Encoding.Default.GetString(data); var split = ret.Split(new string[] {"//"}, StringSplitOptions.None); byte[] texdata = Encoding.Default.GetBytes(split[1]); varstr =

Encoding in Pig

谁都会走 提交于 2020-01-04 15:17:06
问题 Loading data that contains some particular characters (as for example, À, ° and others) using Pig Latin and storing data in a .txt file is possible to see that these symbols in a txt file are displayed as � and ï characters. That happens because of UTF-8 substitution character. I would like to ask if is possible to avoid it somehow, maybe with some pig commands, to have in the result (in txt file) for example À instead of �? 回答1: In Pig we have built in dynamic invokers that that allow a

Wrong Charset Encoding with Play Framework 2.1

拟墨画扇 提交于 2020-01-04 14:22:12
问题 I have a web service that receives a parameter in ISO-8859-1 encoding. But when I try to read it from the request, I get this characters: ����� I've tryied all these approaches, but none of the convert the given string to the expected one (áéíóú): val a = new String(_html.getBytes()); val b = new String(_html.getBytes(), "UTF-8") val c = new String(_html.getBytes(), "ISO-8859-1") val d = new String(_html.getBytes("ISO-8859-1"), "UTF-8") val e = new String(_html.getBytes("ISO-8859-1"), "ISO

How to identify encoding from hex values?

强颜欢笑 提交于 2020-01-04 06:03:48
问题 I have text on a website that displays like that: o¨ instead of ö I extracted the text out of the CMS and analysed it's hex values: the ö's that are displays correctly have c3 b6 - UTF-8 the ö's that are displayed incorrect have 6f cc 88 I couldn't find out what encoding this is. What's a good way to identify the encoding? 回答1: 6F is the UTF-8 (ASCII) encoding of "o", nothing spectacular. CC 88 is the UTF-8 encoding of U+0308, COMBINING DIAERESIS. You're simply looking at the decomposed form

How to fix ActionView::Template::Error (incompatible character encodings: ASCII-8BIT and UTF-8)

倾然丶 夕夏残阳落幕 提交于 2020-01-04 06:02:48
问题 I am getting the following error [ActionView::Template::Error (incompatible character encodings: ASCII-8BIT and UTF-8)] Here is the log... Completed 500 Internal Server Error in 318ms Jan 09 23:29:19 burro app/web.1: ActionView::Template::Error (incompatible character encodings: ASCII-8BIT and UTF-8): Jan 09 23:29:19 burro app/web.1: 97: <!-- <td><%= row.notes.gsub("\n", "<br>").html_safe %></td> --> Jan 09 23:29:19 burro app/web.1: 98: </tr> Jan 09 23:29:19 burro app/web.1: 99: <% end %> Jan

DocumentBuilder parse produces invalid byte 2 of 4-byte UTF-8 sequence error

回眸只為那壹抹淺笑 提交于 2020-01-04 05:51:13
问题 I am trying to parse a bytearray which contains a string Impresión in XML final DocumentBuilderFactory builderFactory = DocumentBuilderFactory.newInstance(); final DocumentBuilder builder = builderFactory.newDocumentBuilder(); final Document document; try (final InputStream stream = new ByteArrayInputStream(bytearray)) { document = builder.parse(stream); // gives Invalid byte 2 of 4-byte UTF-8 sequence error } It produces Invalid byte 2 of 4-byte UTF-8 sequence error. But when i have Unicode