utf | 易学教程

Convert UTF-16 to UTF-8

阅读更多关于 Convert UTF-16 to UTF-8

I am current using VC++ 2008 MFC. Due to PostgreSQL doesn't support UTF-16 (Encoding used by Windows for Unicode), I need to convert string from UTF-16 to UTF-8, before store it. Here is my code snippet. // demo.cpp : Defines the entry point for the console application. // #include "stdafx.h" #include "demo.h" #include "Utils.h" #include <iostream> #ifdef _DEBUG #define new DEBUG_NEW #endif // The one and only application object CWinApp theApp; using namespace std; int _tmain(int argc, TCHAR* argv[], TCHAR* envp[]) { int nRetCode = 0; // initialize MFC and print and error on failure if (

Convert Unicode code points to UTF-8 and UTF-32

阅读更多关于 Convert Unicode code points to UTF-8 and UTF-32

问题 I can't think of a way to remove the leading zeros. My goal was in a for loop to then create the UTF-8 and UTF-32 versions of each number. For example, with UTF-8 wouldn't I have to remove the leading zeros? Does anyone have a solution for how to pull this off? Basically what I am asking is: does someone have a easy solution to convert Unicode code points to UTF-8? for (i = 0x0; i < 0xffff; i++) { printf("%#x \n", i); //convert to UTF8 } So here is an example of what I am trying to accomplish

How to print degree symbol on the window using qt5(QtQuick 2.1) and above

阅读更多关于 How to print degree symbol on the window using qt5(QtQuick 2.1) and above

When I was using up to qt4.8(qt quick 1.1) for gui then I am successfully able to print degree with \260 but when things got upgraded to qt5 and above then this stopped working. I searched on the net and found many relevant link such as ( http://www.fileformat.info/info/unicode/char/00b0/index.htm ) I tried but no help. Do I need to include some library for usinf UTF format or problem is sth else. Please some one help. What to do? @Revised, Here it is described what is being done. First I am storing the printable statement in string text . As in cpp function:- sprintf(text, "%02d\260 %03d\260

Python read from file and remove non-ascii characters

阅读更多关于 Python read from file and remove non-ascii characters

I have the following program that reads a file word by word and writes the word again to another file but without the non-ascii characters from the first file. import unicodedata import codecs infile = codecs.open('d.txt','r',encoding='utf-8',errors='ignore') outfile = codecs.open('d_parsed.txt','w',encoding='utf-8',errors='ignore') for line in infile.readlines(): for word in line.split(): outfile.write(word+" ") outfile.write("\n") infile.close() outfile.close() The only problem that I am facing is that with this code it does not print a new line to the second file (d_parsed). Any clues??

how to determine text encoding

阅读更多关于 how to determine text encoding

I know UTF file has BOM for determining encoding but what about other encoding that has no clue how to guess that encoding. I am new java programmer. I have written code for guessing UTF encoding using UTF BOM. but I have problem with other encoding. How do I guess them. Anybody can help me? thanks in Advance. Todd Owen This question is a duplicate of several previous ones . There are at least two libraries for Java that attempt to guess the encoding (although keep in mind that there is no way to guess right 100% of the time). GuessEncoding jchardet (Java port of the algorithm used by mozilla

jsp utf encoding

阅读更多关于 jsp utf encoding

问题 I'm having a hard time figuring out how to handle this problem: I'm developing a web tool for an Italian university, and I have to display words with accents (such as è, ù, ...); sometimes I get these words from a PostgreSql table (UTF8-encoded), but mostly I have to read long passages from a file. These files are encoded as utf-8 xml, and display fine in Smultron or any utf-8 editor (they were created parsing in python old files with entities such as è instead of "è"). I wrote a java class

Replace éàçè… with equivalent “eace” In GWT

阅读更多关于 Replace éàçè… with equivalent “eace” In GWT

问题 I tried s=Normalizer.normalize(s, Normalizer.Form.NFD).replaceAll("[^\\p{ASCII}]", ""); But it seems that GWT API doesn't provide such fonction. I tried also : s=s.replace("é",e); But it doesn't work either The scenario is I'am trying to générate token from the clicked Widget's text for the history management 回答1: You can take ASCII folding filter from Lucene and add to your project. You can just take foldToASCII() method from ASCIIFoldingFilter (the method does not have any dependencies).

Why is sys.getdefaultencoding() different from sys.stdout.encoding and how does this break Unicode strings?

阅读更多关于 Why is sys.getdefaultencoding() different from sys.stdout.encoding and how does this break Unicode strings?

问题 I spent a few angry hours looking for the problem with Unicode strings that was broken down to something that Python (2.7) hides from me and I still don't understand. First, I tried to use u".." strings consistently in my code, but that resulted in the infamous UnicodeEncodeError . I tried using .encode('utf8') , but that didn't help either. Finally, it turned out I shouldn't use either and it all works out automagically. However, I (here I need to give credit to a friend who helped me) did

Convert Unicode code points to UTF-8 and UTF-32

阅读更多关于 Convert Unicode code points to UTF-8 and UTF-32

I can't think of a way to remove the leading zeros. My goal was in a for loop to then create the UTF-8 and UTF-32 versions of each number. For example, with UTF-8 wouldn't I have to remove the leading zeros? Does anyone have a solution for how to pull this off? Basically what I am asking is: does someone have a easy solution to convert Unicode code points to UTF-8? for (i = 0x0; i < 0xffff; i++) { printf("%#x \n", i); //convert to UTF8 } So here is an example of what I am trying to accomplish for each i . For example: Unicode value U+0760 (Base 16) would convert to UTF8 as in binary: 1101 1101

What's the point of UTF-16?

阅读更多关于 What's the point of UTF-16?

问题 I've never understood the point of UTF-16 encoding. If you need to be able to treat strings as random access (i.e. a code point is the same as a code unit) then you need UTF-32, since UTF-16 is still variable length. If you don't need this, then UTF-16 seems like a colossal waste of space compared to UTF-8. What are the advantages of UTF-16 over UTF-8 and UTF-32 and why do Windows and Java use it as their native encoding? 回答1: When Windows NT was designed UTF-16 didn't exist (NT 3.51 was born