cjk | 易学教程

Converting zenkaku characters to hankaku and vice-versa in C#

阅读更多关于 Converting zenkaku characters to hankaku and vice-versa in C#

问题 As it says in the header line, I want to convert zenkaku characters to hankaku ones and vice-vrsa in C#, but can't figure out how to do it. So, say "ラーメン" to "ﾗｰﾒﾝ" and the other way around. Would it be possible to write this in a method which determines automatically which way the conversion needs to go, based on the format of the input? 回答1: You can use the Strings.StrConv() method by including a reference to Microsoft.VisualBasic.dll, or you can p/invoke the LCMapString() native function:

Are chinese characters allowed entered in URLs?

阅读更多关于 Are chinese characters allowed entered in URLs?

问题 Are chinese characters allowed to be entered in URLs? As tested, chinese characters are able to be entered in URLs, and it will convert to punycode as well and send out the request as well too, and reach to the related page. But for currently, is there anybody else will do validation for website URLs to be allowed chinese character as well? 回答1: Punycode exists to be able to use non-Latin scripts in non-supported software. So whilst I like my site http://見.香港/ I can enter http://xn--nw2a.xn-

Split a sentence into separate words

阅读更多关于 Split a sentence into separate words

I need to split a Chinese sentence into separate words. The problem with Chinese is that there are no spaces. For example, the sentence may look like: 主楼怎么走 (with spaces it would be: 主楼怎么走 ). At the moment I can think of one solution. I have a dictionary with Chinese words (in a database). The script will: try to find the first two characters of the sentence in the database ( 主楼 ), if 主楼 is actually a word and it's in the database the script will try to find first three characters ( 主楼怎 ). 主楼怎 is not a word, so it's not in the database => my application now knows that 主楼 is a separate word.

How to split Chinese characters in PHP?

阅读更多关于 How to split Chinese characters in PHP?

问题 I need some help regarding how to split Chinese characters mixed with English words and numbers in PHP. For example, if I read FrontPage 2000中文版應用大全 I'm hoping to get FrontPage, 2000, 中,文,版,應,用,大,全 or FrontPage, 2,0,0,0, 中,文,版,應,用,大,全 How can I achieve this? Thanks in advance :) 回答1: Assuming you are using UTF-8 (or you can convert it to UTF-8 using Iconv or some other tools), then using the u modifier (doc: http://www.php.net/manual/en/reference.pcre.pattern.modifiers.php ) <? $s =

'𠂉' Not a valid unicode character, but in the unicode character set?

阅读更多关于 '𠂉' Not a valid unicode character, but in the unicode character set?

问题 Short story: I can't get an entity like '𠂉' to store in a MySQL database, either by using a text field in a Ruby on Rails app (with default UTF-8 encoding) or by inputting it directly with a MySQL GUI app. As far as I can tell, all Chinese characters and radicals can be entered into the database without problem, but not these rarely typed 'character components.' The character mentioned above is unicode U+20089 and html entity 𠂉 I can get it to display on the page by entering <html>𠂉</html>

How to convert Chinese characters to Pinyin

阅读更多关于 How to convert Chinese characters to Pinyin

For sorting Chinese language text, I want to convert Chinese characters to Pinyin, properly separating each Chinese character and grouping successive characters together. Can you please help me in this task by providing the logic or source code for doing this? Please let me know if any open source or lib already present for this. Short answer: you don't. Long answer: There is no one-to-one mapping for 汉字 to 汉语拼音. Just some quick examples: 把 can be "ba" in the third tone or fourth tone. 了 can be "le" toneless or "liao" third tone. 乐 can be "le" or "yue", both in the fourth tone. 落 can be "luo",

Language codes for simplified Chinese and traditional Chinese?

阅读更多关于 Language codes for simplified Chinese and traditional Chinese?

We are creating multi-language subsites on our website. I would like to use the 2-letter language codes. Spanish and French are easy. They will get URLs like: mydomain.com/es mydomain.com/fr but I run into a problem with Traditional and Simplified chinese. Are there standards for which 2 letter codes to use for these languages? mydomain.com/zh mydomain.com/? @dkarp gives an excellent general answer. I will add some additional specifics regarding Chinese: There are several countries where Chinese is the main written language. The major difference between them is whether they use simplified or

How do you sort CJK (Asian) characters in Perl, or with any other programming language?

阅读更多关于 How do you sort CJK (Asian) characters in Perl, or with any other programming language?

问题 How do you sort Chinese, Japanese and Korean (CJK) characters in Perl? As far as I can tell, sorting CJK characters by stroke count, then by radical, seems to be the way these languages are sorted. There are also some methods that sort by sounds, but this seems less common. I've tried using: perl -e 'print join(" ", sort qw(工然一人三古二 )), "\n";' # Prints: 一三二人古工然 which is incorrect And I've tried using Unicode::Collate from CPAN, but it says: By default, CJK Unified Ideographs are

Flutter fetched Japanese character from server decoded wrong

阅读更多关于 Flutter fetched Japanese character from server decoded wrong

I am building a mobile app with Flutter. I need to fetch a json file from server which includes Japanese text. A part of the returned json is: { "id": "egsPu39L5bLhx3m21t1n", "userId": "MCetEAeZviyYn5IMYjnp", "userName": "巽裕亮", "content": "フルマラソン完走に対して2018/05/06のふりかえりを行いました！" } Trying the same request on postman or chrome gives the expected result (Japanese characters are rendered properly in the output). But when the data is fetched with Dart by the following code snippet: import 'dart:convert'; import 'package:http/http.dart' as http; //irrelevant parts have been omitted final response =

How to use Boost Spirit to parse Chinese(unicode utf-16)?

阅读更多关于 How to use Boost Spirit to parse Chinese(unicode utf-16)?

My program does not recognize Chinese. How to use spirit to recognize Chinese? I use wstring and has convert it to utf-16. Here is my header file: #pragma once #define BOOST_SPIRIT_UNICODE #include <boost/spirit/include/qi.hpp> #include <string> #include <vector> #include <map> using namespace std; namespace qi = boost::spirit::qi; namespace ascii = boost::spirit::ascii; typedef pair<wstring,wstring> WordMeaningType; typedef vector<WordMeaningType> WordMeaningsType; typedef pair<wstring,WordMeaningsType> WordType; typedef vector<WordType> WordListType; struct WordPaser :qi::grammar<wstring: