character-encoding | 易学教程

parsing chinese characters in java showing weird behaviour

阅读更多关于 parsing chinese characters in java showing weird behaviour

问题 I am having a csv file which has some fields having chinese character strings. Unfortunately i dont know what is encoding of this input csv file. I am trying to read this input csv and using selective fields from it, i am making a html and another csv file as output. While reading csv input, i tried all encoding from list http://docs.oracle.com/javase/7/docs/technotes/guides/intl/encoding.doc.html which have Chinese mentioned in their description. And found if I use InputStreamReader read =

C++ sending string over socket

阅读更多关于 C++ sending string over socket

问题 I'm doing a small client/server project as a final project in a C++ course. We are handed some classes that take care of communication (using sys/socket.h) and we can basically do connection->send(byte) to send one byte of data. Say I have a string that I want to send. How do I make sure an 'a' is interpreted as an 'a' when sent from client to server or vice versa? Since the standard isn't saying anything about char defaulting to unsigned or signed I don't know how to handle it. I had some

C++ sending string over socket

阅读更多关于 C++ sending string over socket

mysql key was too long issue

阅读更多关于 mysql key was too long issue

问题 i was trying to import my backup after ive change the charset to utf8 from latin1 , now during my import i got this error ERROR 1071 (42000) at line 2137: Specified key was too long; max key length is 1000 bytes try to change my.cnf and set all charset and connection to utf8 , but now luck , i dont want to get back to latin1 and i know this will fix the issue , but utf8 my gaol any clue ? i know latin 1 byte = 1 char and utf8 3 byte = 1 .. 回答1: Can you switch from MyISAM to InnoDB, it seems

Chinese character in source code when UTF-8 settings can't be used [duplicate]

阅读更多关于 Chinese character in source code when UTF-8 settings can't be used [duplicate]

问题 This question already has an answer here : PHP and C++ for UTF-8 code unit in reverse order in Chinese character (1 answer) Closed 6 years ago . This is the scenario: I can only use the char* data type for the string, not wchar_t * My MS Visual C++ compiler has to be set to MBCS, not UNICODE because the third party source code that I have is using MBCS; Setting it to UNICODE will cause data type issues. I am trying to print chinese characters on a printer which needs to get a character string

Obfuscate a Python script in Unicode escape sequences

阅读更多关于 Obfuscate a Python script in Unicode escape sequences

问题 I want to obfuscate a Python script by using Unicode escape sequences. For example, print("Hello World") in Unicode escape sequences is: \x70\x72\x69\x6e\x74\x28\x22\x48\x65\x6c\x6c\x6f\x20\x57\x6f\x72\x6c\x64\x22\x29 From my command line, I can achieve this with: $ python3 -c \x70\x72\x69\x6e\x74\x28\x22\x48\x65\x6c\x6c\x6f\x20\x57\x6f\x72\x6c\x64\x22\x29 Hello World I've create a file and put the "Hello World" Unicode escape sequence in it as the source code. But when I run it, I get: $

Emacs encoding of pasted text

阅读更多关于 Emacs encoding of pasted text

问题 I'm using (occasionaly) emacs24.3 on windows 8 and I have a problem of encoding when pasting text , non ascii characters being replaced by their \uucode . My emacs coding default are utf-8 I tried to have a quick look in the apropos help, but nothing matched the keywords i used. What would be the variable to configure, and the value of system encoding for a windows os? 回答1: After some digging, and orientation from lawlist link I went for (set-clipboard-coding-system 'utf-16le-dos) 来源： https:/

How can I programatically determine the maximum size in bytes of a character in a specific charset?

阅读更多关于 How can I programatically determine the maximum size in bytes of a character in a specific charset?

问题 I am getting all supported charsets by using : Object[] Charsets = Charset.availableCharsets().keySet().toArray(); I now need to iterate through each character that can be encoded in that charset. To do this I thought about using the maximum number of bytes for each encoding and going through Byte.MIN_VALUE to Byte.MAX_VALUE for each byte. That byte array I then pass it through the String constructor that accepts a byte[] array and a specific encoding. However can't find any clues on how I

regexp for all accented characters in Oracle

阅读更多关于 regexp for all accented characters in Oracle

问题 I am trying to find data that has accented characters. I've tried this: select * from xml_tmp where regexp_like (XMLTYpe.getClobVal(xml_tmp.xml_data), unistr('\0090')) And it works. It finds all records where the XML data field contains É. The problem is that it only matches the upper-case E with an accent. I tried to write a more generic query to find ALL data with accented vowels (a, e, i, o, u, upper and lowercase, with any accents) using equivalence classes. I wanted a regex to match only

Avoiding SSIS script task to convert utf-8 to unicode for AS400 data to SQL Server

阅读更多关于 Avoiding SSIS script task to convert utf-8 to unicode for AS400 data to SQL Server

问题 After many tries I have concluded that the optimal way to transfer with SSIS data from AS400 (non-unicode) to SQL Server is: Use native transfer utility to dump data to tsv (tab delimited) Convert files from utf-8 to unicode Use bulk insert to put them into SQL Server In #2 step I have found a ready made code that does this: string from = @"\\appsrv02\c$\bg_f0101.tsv"; string to = @"\\appsrv02\c$\bg_f0101.txt"; using (StreamReader reader = new StreamReader(from, Encoding.UTF8, false, 1000000)