cjk | 易学教程

Unicode fonts in PyGame

阅读更多关于 Unicode fonts in PyGame

How can I display Chinese characters in PyGame? And what's a good free/libre font to use for this purpose? SingleNegationElimination pygame uses SDL_ttf for rendering, so you should be in fine shape as rendering goes. unifont.org appears to have some extensive resources on Open-Source fonts for a range of scripts. I grabbed the Cyberbit pan-unicode font and extracted the encluded ttf. The folowing 'worked on my machine' which is a Windows Vista Home Basic and Python 2.6: # -*- coding: utf-8 -*- import pygame, sys unistr = u"黒澤明" pygame.font.init() srf = pygame.display.set_mode((640,480)) f =

UTF-8 CJK characters not displaying in Java

阅读更多关于 UTF-8 CJK characters not displaying in Java

I've been reading up on Unicode and UTF-8 encoding for a while and I think I understand it, so hopefully this won't be a stupid question: I have a file which contains some CJK characters, and which has been saved as UTF-8. I have various Asian language packs installed and the characters are rendered properly by other applications, so I know that much works. In my Java app, I read the file as follows: // Create objects fis = new FileInputStream(new File("xyz.sgf")); InputStreamReader is = new InputStreamReader(fis, Charset.forName("UTF-8")); BufferedReader br = new BufferedReader(is); // Read

Understanding Python Unicode and Linux terminal

阅读更多关于 Understanding Python Unicode and Linux terminal

I have a Python script that writes some strings with UTF-8 encoding. In my script I am using mainly the str() function to cast to string. It looks like that: mystring="this is unicode string:"+japanesevalues[1] #japanesevalues is a list of unicode values, I am sure it is unicode print mystring I don't use the Python terminal, just the standard Linux Red Hat x86_64 terminal. I set the terminal to output utf8 chars. If I execute this: #python myscript.py this is unicode string: カラダーズソフィー But if I do that: #python myscript.py > output I got the typical error: UnicodeEncodeError: 'ascii' codec

Regular Expression for Japanese characters

阅读更多关于 Regular Expression for Japanese characters

I am doing internationalization in Struts. I want to write Javascript validation for Japanese and English users. I know regular expression for English but not for Japanese users. Is it possible to write one regular expression for both the users which validate on the basis of Unicode? Please help me. This thread may be old but just thought that I add my 2 cents. Here is a regular expression that can be used to match all English alphanumerics, Japanese katakana,hiragana,multibytes of alphanumerics [hankaku and zenkaku],dashes /[一-龠]+|[ぁ-ゔ]+|[ァ-ヴー]+|[a-zA-Z0-9]+|[ａ-ｚＡ-Ｚ０-９]+[々〆〤]+/u You can edit

What are all the Japanese whitespace characters?

阅读更多关于 What are all the Japanese whitespace characters?

问题 I need to split a string and extract words separated by whitespace characters.The source may be in English or Japanese. English whitespace characters include tab and space, and Japanese text uses these too. (IIRC, all widely-used Japanese character sets are supersets of US-ASCII.) So the set of characters I need to use to split my string includes normal ASCII space and tab. But, in Japanese, there is another space character, commonly called a 'full-width space'. According to my Mac's

Android default charset when sending http post/put - Problems with special characters

阅读更多关于 Android default charset when sending http post/put - Problems with special characters

I have configured the apache httpClient like so: HttpProtocolParams.setContentCharset(httpParameters, "UTF-8"); HttpProtocolParams.setHttpElementCharset(httpParameters, "UTF-8"); I also include the http header "Content-Type: application/json; charset=UTF-8" for all http post and put requests. I am trying to send http post/put requests with a json body that contains special characters (ie. chinese characters via the Google Pinyin keyboard, symbols, etc.) The characters appear as gibberish in the logs but I think this is because DDMS does not support UTF-8, as descibed in this issue. The problem

Is there any good open-source or freely available Chinese segmentation algorithm available? [closed]

阅读更多关于 Is there any good open-source or freely available Chinese segmentation algorithm available? [closed]

As phrased in the question, I'm looking for a free and/or open-source text-segmentation algorithm for Chinese, I do understand it is a very difficult task to solve, as there are many ambiguities involed. I know there's google's API, but well it is rather a black-box, i.e. not many information of what it is doing are passing through. lschin The keyword text-segmentation for Chinese should be 中文分词 in Chinese. Good and active open-source text-segmentation algorithm : 盘古分词(Pan Gu Segment) : C# , Snapshot ik-analyzer : Java ICTCLAS : C/C++, Java, C# , Demo NlpBamboo : C, PHP, PostgreSQL HTTPCWS :

Testing Android Market in-app billing with dummy credit card credentials

阅读更多关于 Testing Android Market in-app billing with dummy credit card credentials

问题 I have configured an Android application to use the in-app billing module as documented at: http://developer.android.com/guide/market/billing/index.html It works fine when tested using the UK development team's accounts which have real credit cards associated with them. However, part of my development team is based in China, and as Google Billing does not operate in China, they are unable to test the billing functionality. Understandably the team is uncomfortable sharing personal card details

Differentiating CJK languages (Chinese, Japanese, Korean) in Android

阅读更多关于 Differentiating CJK languages (Chinese, Japanese, Korean) in Android

问题 I want to be able to recognize Chinese, Japanese, and Korean written characters, both as a general group and as subdivided languages. These are the reasons: Recognize CJK as a general group: I am making a vertical script Mongolian TextView . To do that I need to rotate the line of text 90 degrees because the glyphs are stored horizontally in the font. However, for CJK languages, I need to rotate them back again so that they are written in their correct orientation but just stacked on top of

Iphone CGContextShowTextAtPoint for Japanese characters

阅读更多关于 Iphone CGContextShowTextAtPoint for Japanese characters

I am working on an app where I am using CGContextShowTextAtPoint to display text to the screen. I want to also display Japanese characters, but CGContextShowTextAtPoint takes as its input a C string. So either A) How do I change Japanese characters into a C string? If this is not possible, B) How can I manually print Japanese characters to the screen (within the drawRect method). Thanks in advance. Rhythmic Fistman CoreText can help you: CTFontGetGlyphsForCharacters (iOS 3.2 onwards) maps Unicode characters to glyphs CTFontDrawGlyphs (iOS 4.2 onwards) draws the glyphs into a CGContext. NB.