cjk | 易学教程

Convert or extract TTC font to TTF - how to?

阅读更多关于 Convert or extract TTC font to TTF - how to?

问题 I am already more than 8 hours trying to make the STHeiti Medium.ttc.zip font work on Windows. But I can't make it work. Is anybody able to make it work on Windows? 回答1: Assuming that Windows doesn't really know how to deal with TTC files (which I honestly find strange), you can "split" the combined fonts in an easy way if you use fontforge. The steps are: Download the file. Unzip it (e.g., unzip "STHeiti Medium.ttc.zip" ). Load Fontforge. Open it with Fontforge (e.g., File > Open ).

Programming tips with Japanese Language/Characters [closed]

阅读更多关于 Programming tips with Japanese Language/Characters [closed]

问题 Closed . This question needs to be more focused. It is not currently accepting answers. Want to improve this question? Update the question so it focuses on one problem only by editing this post. Closed 2 years ago . I have an idea for a few web apps to write to help me, and maybe others, learn Japanese better since I am studying the language. My problem is the site will be in mostly english, so it needs to mix fluently Japanese Characters, usually hirigana and katakana, but later kanji. I am

Python: any way to perform this “hybrid” split() on multi-lingual (e.g. Chinese & English) strings?

阅读更多关于 Python: any way to perform this “hybrid” split() on multi-lingual (e.g. Chinese & English) strings?

问题 I have strings that are multi-lingual consist of both languages that use whitespace as word separator (English, French, etc) and languages that don't (Chinese, Japanese, Korean). Given such a string, I want to separate the English/French/etc part into words using whitespace as separator, and to separate the Chinese/Japanese/Korean part into individual characters. And I want to put of all those separated components into a list. Some examples would probably make this clear: Case 1 : English

Django: How to add Chinese support to the application

阅读更多关于 Django: How to add Chinese support to the application

问题 I am trying to add a Chinese language to my application written in Django and I have a really hard time with that. I have spent half a day trying different approaches, no success. My application supports few languages, this is part of settings.py file: TIME_ZONE = 'Europe/Dublin' LANGUAGE_CODE = 'en' LOCALES = ( #English ('en', u'English'), #Norwegian ('no', u'Norsk'), #Finish ('fi', u'Suomi'), #Simplified Chinese ('zh-CN', u'简体中文'), #Traditional Chinese ('zh-TW', u'繁體中文'), #Japanese ('ja', u

Word break in languages without spaces between words (e.g., Asian)?

阅读更多关于 Word break in languages without spaces between words (e.g., Asian)?

问题 I'd like to make MySQL full text search work with Japanese and Chinese text, as well as any other language. The problem is that these languages and probably others do not normally have white space between words. Search is not useful when you must type the same sentence as is in the text. I can not just put a space between every character because English must work too. I would like to solve this problem with PHP or MySQL. Can I configure MySQL to recognize characters which should be their own

Recognizing text as Simplified vs. Traditional Chinese

阅读更多关于 Recognizing text as Simplified vs. Traditional Chinese

Given a block of text that's known to be Chinese and encoded in UTF-8, is there a way to determine if it's Simplified or Traditional? I don't know if this will work, but I'd try using iconv to see if it will translate between the charsets correctly, comparing the results from the same conversion with //TRANSLIT and //IGNORE. If the two results match, then the charset conversion hasn't encountered any characters that fail to translate, so you should have a match. $test1 = iconv("UTF-8", "big5//TRANSLIT", $text); $test2 = iconv("UTF-8", "big5//IGNORE", $text); if ($test1 == $test2) { echo

Convert numbered pinyin to pinyin with tone marks

阅读更多关于 Convert numbered pinyin to pinyin with tone marks

Are there any scripts, libraries, or programs using Python , or BASH tools (e.g. awk , perl , sed ) which can correctly convert numbered pinyin (e.g. dian4 nao3) to UTF-8 pinyin with tone marks (e.g. diàn nǎo)? I have found the following examples, but they require PHP or #C : PHP Convert numbered to accentuated Pinyin? C Any libraries to convert number Pinyin to Pinyin with tone markings? I have also found various On-line tools, but they cannot handle a large number of conversions. I've got some Python 3 code that does this, and it's small enough to just put directly in the answer here.

Any tools to programmatically convert Japanese sentence into its romaji (phonetical reading)? [closed]

阅读更多关于 Any tools to programmatically convert Japanese sentence into its romaji (phonetical reading)? [closed]

Closed. This question is off-topic. It is not currently accepting answers. Learn more . Want to improve this question? Update the question so it's on-topic for Stack Overflow. Input: 日本が好きです. Output: Nippon ga sukidesu. Phonetical reading is unfortunately not available through Google Translate API. KAKASI is a good, simple tool for what you want to do: % echo "日本が好きです。" | iconv -f utf8 -t eucjp | kakasi -i euc -Ha -Ka -Ja -Ea -ka nippongasukidesu. % echo "日本が好きです。" | iconv -f utf8 -t eucjp | kakasi -i euc -w | kakasi -i euc -Ha -Ka -Ja -Ea -ka nippon ga suki desu . Or another solution is to

convert unicode into character with ruby

阅读更多关于 convert unicode into character with ruby

I found a dictionary of Chinese characters in unicode. I'm trying to build a database of Characters out of this dictionary but I don't know how to convert unicode to a character.. p "国".unpack("U*").first #this gives the unicode 22269 How can convert 22269 back into the character value which would be the opposite of the line above. [22269].pack('U*') #=> "国" or "\345\233\275" Edit : Works in 1.8.6+ (verified in 1.8.6, 1.8.7, and 1.9.2). In 1.8.x you get a three-byte string representing the single Unicode character, but using puts on that causes the correct Chinese character to appear in the

How to classify Japanese characters as either kanji or kana?

阅读更多关于 How to classify Japanese characters as either kanji or kana?

Given the text below, how can I classify each character as kana or kanji ? 誰か確認上記これらのフ To get some thing like this 誰 - kanji か - kana 確 - kanji 認 - kanji 上 - kanji 記 - kanji こ - kana れ - kana ら - kana の - kana フ - kana (Sorry if I did it incorrectly.) Josh Lee This functionality is built into the Character.UnicodeBlock class. Some examples of the Unicode blocks related to the Japanese language: Character.UnicodeBlock.of('誰') == CJK_UNIFIED_IDEOGRAPHS Character.UnicodeBlock.of('か') == HIRAGANA Character.UnicodeBlock.of('フ') == KATAKANA Character.UnicodeBlock.of('ﾌ') == HALFWIDTH_AND_FULLWIDTH