cjk

Converting zenkaku characters to hankaku and vice-versa in C#

☆樱花仙子☆ 提交于 2019-11-30 05:04:47
问题 As it says in the header line, I want to convert zenkaku characters to hankaku ones and vice-vrsa in C#, but can't figure out how to do it. So, say "ラーメン" to "ラーメン" and the other way around. Would it be possible to write this in a method which determines automatically which way the conversion needs to go, based on the format of the input? 回答1: You can use the Strings.StrConv() method by including a reference to Microsoft.VisualBasic.dll, or you can p/invoke the LCMapString() native function:

Are chinese characters allowed entered in URLs?

自作多情 提交于 2019-11-30 04:50:44
问题 Are chinese characters allowed to be entered in URLs? As tested, chinese characters are able to be entered in URLs, and it will convert to punycode as well and send out the request as well too, and reach to the related page. But for currently, is there anybody else will do validation for website URLs to be allowed chinese character as well? 回答1: Punycode exists to be able to use non-Latin scripts in non-supported software. So whilst I like my site http://見.香港/ I can enter http://xn--nw2a.xn-

Split a sentence into separate words

守給你的承諾、 提交于 2019-11-30 00:33:54
I need to split a Chinese sentence into separate words. The problem with Chinese is that there are no spaces. For example, the sentence may look like: 主楼怎么走 (with spaces it would be: 主楼 怎么 走 ). At the moment I can think of one solution. I have a dictionary with Chinese words (in a database). The script will: try to find the first two characters of the sentence in the database ( 主楼 ), if 主楼 is actually a word and it's in the database the script will try to find first three characters ( 主楼怎 ). 主楼怎 is not a word, so it's not in the database => my application now knows that 主楼 is a separate word.

How to split Chinese characters in PHP?

余生长醉 提交于 2019-11-29 23:43:06
问题 I need some help regarding how to split Chinese characters mixed with English words and numbers in PHP. For example, if I read FrontPage 2000中文版應用大全 I'm hoping to get FrontPage, 2000, 中,文,版,應,用,大,全 or FrontPage, 2,0,0,0, 中,文,版,應,用,大,全 How can I achieve this? Thanks in advance :) 回答1: Assuming you are using UTF-8 (or you can convert it to UTF-8 using Iconv or some other tools), then using the u modifier (doc: http://www.php.net/manual/en/reference.pcre.pattern.modifiers.php ) <? $s =

'𠂉' Not a valid unicode character, but in the unicode character set?

别来无恙 提交于 2019-11-29 23:00:31
问题 Short story: I can't get an entity like '𠂉' to store in a MySQL database, either by using a text field in a Ruby on Rails app (with default UTF-8 encoding) or by inputting it directly with a MySQL GUI app. As far as I can tell, all Chinese characters and radicals can be entered into the database without problem, but not these rarely typed 'character components.' The character mentioned above is unicode U+20089 and html entity 𠂉 I can get it to display on the page by entering <html>𠂉</html>

How to convert Chinese characters to Pinyin

喜你入骨 提交于 2019-11-29 20:43:55
For sorting Chinese language text, I want to convert Chinese characters to Pinyin, properly separating each Chinese character and grouping successive characters together. Can you please help me in this task by providing the logic or source code for doing this? Please let me know if any open source or lib already present for this. Short answer: you don't. Long answer: There is no one-to-one mapping for 汉字 to 汉语拼音. Just some quick examples: 把 can be "ba" in the third tone or fourth tone. 了 can be "le" toneless or "liao" third tone. 乐 can be "le" or "yue", both in the fourth tone. 落 can be "luo",

Language codes for simplified Chinese and traditional Chinese?

陌路散爱 提交于 2019-11-29 19:31:23
We are creating multi-language subsites on our website. I would like to use the 2-letter language codes. Spanish and French are easy. They will get URLs like: mydomain.com/es mydomain.com/fr but I run into a problem with Traditional and Simplified chinese. Are there standards for which 2 letter codes to use for these languages? mydomain.com/zh mydomain.com/? @dkarp gives an excellent general answer. I will add some additional specifics regarding Chinese: There are several countries where Chinese is the main written language. The major difference between them is whether they use simplified or

How do you sort CJK (Asian) characters in Perl, or with any other programming language?

白昼怎懂夜的黑 提交于 2019-11-29 18:21:23
问题 How do you sort Chinese, Japanese and Korean (CJK) characters in Perl? As far as I can tell, sorting CJK characters by stroke count, then by radical, seems to be the way these languages are sorted. There are also some methods that sort by sounds, but this seems less common. I've tried using: perl -e 'print join(" ", sort qw(工 然 一 人 三 古 二 )), "\n";' # Prints: 一 三 二 人 古 工 然 which is incorrect And I've tried using Unicode::Collate from CPAN, but it says: By default, CJK Unified Ideographs are

Flutter fetched Japanese character from server decoded wrong

风流意气都作罢 提交于 2019-11-29 13:26:59
I am building a mobile app with Flutter. I need to fetch a json file from server which includes Japanese text. A part of the returned json is: { "id": "egsPu39L5bLhx3m21t1n", "userId": "MCetEAeZviyYn5IMYjnp", "userName": "巽 裕亮", "content": "フルマラソン完走に対して2018/05/06のふりかえりを行いました!" } Trying the same request on postman or chrome gives the expected result (Japanese characters are rendered properly in the output). But when the data is fetched with Dart by the following code snippet: import 'dart:convert'; import 'package:http/http.dart' as http; //irrelevant parts have been omitted final response =

How to use Boost Spirit to parse Chinese(unicode utf-16)?

对着背影说爱祢 提交于 2019-11-29 11:30:56
My program does not recognize Chinese. How to use spirit to recognize Chinese? I use wstring and has convert it to utf-16. Here is my header file: #pragma once #define BOOST_SPIRIT_UNICODE #include <boost/spirit/include/qi.hpp> #include <string> #include <vector> #include <map> using namespace std; namespace qi = boost::spirit::qi; namespace ascii = boost::spirit::ascii; typedef pair<wstring,wstring> WordMeaningType; typedef vector<WordMeaningType> WordMeaningsType; typedef pair<wstring,WordMeaningsType> WordType; typedef vector<WordType> WordListType; struct WordPaser :qi::grammar<wstring: