utf-8 | 易学教程

Encoding in Pig

阅读更多关于 Encoding in Pig

问题 Loading data that contains some particular characters (as for example, À, ° and others) using Pig Latin and storing data in a .txt file is possible to see that these symbols in a txt file are displayed as ï¿½ and ï characters. That happens because of UTF-8 substitution character. I would like to ask if is possible to avoid it somehow, maybe with some pig commands, to have in the result (in txt file) for example À instead of ï¿½? 回答1: In Pig we have built in dynamic invokers that that allow a

Transforming string to UTF8

阅读更多关于 Transforming string to UTF8

问题 I have a string that I receive from email via C# and I want to display it in a correct format. I know the encoding in coming in as Encoding.Default , According to this answer I have to convert it to utf8, So I tried this code: byte[] bytes = Encoding.Default.GetBytes(input); string strResult = Encoding.UTF8.GetString(bytes); It works, but it can't convert some characters: Actually in web mail interface Original string is: باسلام همکار گرامی شماره 53018 مربوط به دبیرخانه ستاد می باشد لطفا

charsets in MySQL replication

阅读更多关于 charsets in MySQL replication

问题 What can I do to ensure that replication will use latin1 instead of utf-8? I'm migrating between an MySQL 5.1.22 server (master) on a Linux system and a MySQL 5.1.42 server (slave) on a FreeBSD system. My replication works well, but when non-ascii characters are in my varchars, they turn "weird". The Linux/MySQL-5.1.22 shows the following character set variables: character_set_client=latin1 character_set_connection=latin1 character_set_database=latin1 character_set_filesystem=binary character

OpenCV imread with foreign characters

阅读更多关于 OpenCV imread with foreign characters

问题 We're working on a project using OpenCV 2.4.6 and Qt 5.1.1 in C++. We have to load images for image processing at several points in our code, which we did using cv::imread , as normal. However, we wanted to make software compatible with other language filesystems, and found that having file paths with foreign characters would fail to load. The problem, we believe, has to do with the fact that imread can only take in a std::string (or char* ), and casting a path with non Latin-1 symbols to a

Prevent BeautifulSoup's renderContents() from changing to Â

阅读更多关于 Prevent BeautifulSoup's renderContents() from changing to Â

问题 I'm using bs4 to do some work on some text, but in some cases it converts characters to Â . The best I can tell is that this is an encoding mismatch from UTF-8 to latin1 (or reverse?) Everything in my web app is UTF-8, Python3 is UTF-8, and I've confirmed the database is UTF-8. I've narrowed down the problem to this one line: print("Before soup: " + text) # Before soup: soup = BeautifulSoup(text, "html.parser") #.... do stuff to soup, but all commented out for this testing. soup =

Windows UTF-8 printed with chcp 65001 - characters are mysteriously duplicated

阅读更多关于 Windows UTF-8 printed with chcp 65001 - characters are mysteriously duplicated

问题 Here is one thing I can't get my head around: I am using Windows 7 and Strawberry Perl 5.20, and I want to write UTF-8 to the console (cmd.exe) with chcp 65001. The UTF-8 characters themselves are coming out fine, even >255, but there is a mysterious duplication of some caracters (this only happens if I don't redirect into a file) EDIT: I now have seen another post that had essentially the same problem at last-octet-repeated-when-my-perl-program-outputs-a-utf-8 -- the solution is to inject a

How to fix ActionView::Template::Error (incompatible character encodings: ASCII-8BIT and UTF-8)

阅读更多关于 How to fix ActionView::Template::Error (incompatible character encodings: ASCII-8BIT and UTF-8)

问题 I am getting the following error [ActionView::Template::Error (incompatible character encodings: ASCII-8BIT and UTF-8)] Here is the log... Completed 500 Internal Server Error in 318ms Jan 09 23:29:19 burro app/web.1: ActionView::Template::Error (incompatible character encodings: ASCII-8BIT and UTF-8): Jan 09 23:29:19 burro app/web.1: 97:  Jan 09 23:29:19 burro app/web.1: 98: </tr> Jan 09 23:29:19 burro app/web.1: 99: <% end %> Jan

HTML, XHTML validation error - can't resolve

阅读更多关于 HTML, XHTML validation error - can't resolve

问题 I have been trying to validate my web page for the last two hours, I only have one error remaining before it is successfully validated but I keep on getting the character decoding problem, I cannot get round it..... The whole document is fine except it says... Sorry, I am unable to validate this document because on line 77 it contained one or more bytes that I cannot interpret as utf-8 (in other words, the bytes found are not valid values in the specified Character Encoding). Please check

DocumentBuilder parse produces invalid byte 2 of 4-byte UTF-8 sequence error

阅读更多关于 DocumentBuilder parse produces invalid byte 2 of 4-byte UTF-8 sequence error

问题 I am trying to parse a bytearray which contains a string Impresión in XML final DocumentBuilderFactory builderFactory = DocumentBuilderFactory.newInstance(); final DocumentBuilder builder = builderFactory.newDocumentBuilder(); final Document document; try (final InputStream stream = new ByteArrayInputStream(bytearray)) { document = builder.parse(stream); // gives Invalid byte 2 of 4-byte UTF-8 sequence error } It produces Invalid byte 2 of 4-byte UTF-8 sequence error. But when i have Unicode

Python concatenating strings - UnicodeDecodeError: 'ascii' codec can't decode byte

阅读更多关于 Python concatenating strings - UnicodeDecodeError: 'ascii' codec can't decode byte

问题 I want to concatenate two strings like this: requestData = command + ' ' + data "data" in my case holds binary data, that should not be opened - it should just glue it to command. But imho python is attempting to open it and it fails with: UnicodeDecodeError: 'ascii' codec can't decode byte 0xbc in position 1: ordinal not in range(128) Is there a way to glue it without opening? Edit: Python 2.7 Also my data is actualy not utf-8 decode might not help - its binary data. 回答1: Try using http:/