cjk

Unicode fonts in PyGame

纵然是瞬间 提交于 2019-11-27 06:04:50
问题 How can I display Chinese characters in PyGame? And what's a good free/libre font to use for this purpose? 回答1: pygame uses SDL_ttf for rendering, so you should be in fine shape as rendering goes. unifont.org appears to have some extensive resources on Open-Source fonts for a range of scripts. I grabbed the Cyberbit pan-unicode font and extracted the encluded ttf. The folowing 'worked on my machine' which is a Windows Vista Home Basic and Python 2.6: # -*- coding: utf-8 -*- import pygame, sys

Is there any good open-source or freely available Chinese segmentation algorithm available? [closed]

折月煮酒 提交于 2019-11-27 05:04:57
问题 Closed. This question is off-topic. It is not currently accepting answers. Want to improve this question? Update the question so it's on-topic for Stack Overflow. Closed 5 years ago . As phrased in the question, I'm looking for a free and/or open-source text-segmentation algorithm for Chinese, I do understand it is a very difficult task to solve, as there are many ambiguities involed. I know there's google's API, but well it is rather a black-box, i.e. not many information of what it is doing

How to do a Python split() on languages (like Chinese) that don't use whitespace as word separator?

会有一股神秘感。 提交于 2019-11-27 04:26:23
问题 I want to split a sentence into a list of words. For English and European languages this is easy, just use split() >>> "This is a sentence.".split() ['This', 'is', 'a', 'sentence.'] But I also need to deal with sentences in languages such as Chinese that don't use whitespace as word separator. >>> u"这是一个句子".split() [u'\u8fd9\u662f\u4e00\u4e2a\u53e5\u5b50'] Obviously that doesn't work. How do I split such a sentence into a list of words? UPDATE: So far the answers seem to suggest that this

how to print chinese word in my code.. using python

隐身守侯 提交于 2019-11-27 04:01:09
This is my code: print '哈哈'.decode('gb2312').encode('utf-8') ...and it prints: SyntaxError: Non-ASCII character '\xe5' in file D:\zjm_code\a.py on line 2, but no encoding declared; see http://www.python.org/peps/pep-0263.html for details How do I print '哈哈'? Update: When I use the following code: #!/usr/bin/python # -*- coding: utf-8 -*- print '哈哈' ... it prints 鍝堝搱 . That isn't what I wanted to get. My IDE is Ulipad, is this a bug with the IDE? Second Update: This code will print the characters right: #!/usr/bin/python # -*- coding: utf-8 -*- print u'哈哈'.encode('gb2312') ...and when I use

Understanding Python Unicode and Linux terminal

不想你离开。 提交于 2019-11-27 02:53:47
问题 I have a Python script that writes some strings with UTF-8 encoding. In my script I am using mainly the str() function to cast to string. It looks like that: mystring="this is unicode string:"+japanesevalues[1] #japanesevalues is a list of unicode values, I am sure it is unicode print mystring I don't use the Python terminal, just the standard Linux Red Hat x86_64 terminal. I set the terminal to output utf8 chars. If I execute this: #python myscript.py this is unicode string: カラダーズ ソフィー But

Android default charset when sending http post/put - Problems with special characters

我们两清 提交于 2019-11-27 00:56:58
问题 I have configured the apache httpClient like so: HttpProtocolParams.setContentCharset(httpParameters, "UTF-8"); HttpProtocolParams.setHttpElementCharset(httpParameters, "UTF-8"); I also include the http header "Content-Type: application/json; charset=UTF-8" for all http post and put requests. I am trying to send http post/put requests with a json body that contains special characters (ie. chinese characters via the Google Pinyin keyboard, symbols, etc.) The characters appear as gibberish in

Php check if the string has Chinese chars

天涯浪子 提交于 2019-11-27 00:09:58
I have the string $str and I want to check if it`s content has Chinese chars or not (true/false) $str = "赕就可消垻,只有当所有方块都被消垻时才可以过关"; can you please help me? Thanks! Adrian You could use a unicode character class http://www.regular-expressions.info/unicode.html preg_match("/\p{Han}+/u", $utf8_str); This just checks for the presence of at least one chinese character. You might want to expand on this if you want to match the complete string. eaglewu @mario answer is right! For Chinese chars use this regex: /[\x{4e00}-\x{9fa5}]+/u And Don't forget the u modifier!!! About u modifier reference TKS to

how to use chinese and japanese character as string in java?

こ雲淡風輕ζ 提交于 2019-11-26 22:51:19
Hi I am using java language. In this I have to use some chinese, japanese character as the string and print using System.out.println(). How can I do that? Thanks Java Strings support Unicode, so Chinese and Japanese is no problem. Other tools (such as text editors) and your OS shell probably need to be told about it, though. When reading or printing Unicode data, you have to make sure that the console or stream also supports Unicode (otherwise it will likely be replaced with question marks). Writer unicodeFileWriter = new OutputStreamWriter( new FileOutputStream("a.txt"), "UTF-8");

How can I detect certain Unicode characters in a string in Ruby?

纵然是瞬间 提交于 2019-11-26 22:44:24
问题 Given a string in Ruby 1.8.7 (without the awesome Oniguruma regular expression engine that supports Unicode properties with \p{}), I would like to be able to determine if the string contains one or more Chinese, Japanese, or Korean characters; i.e. class String def contains_cjk? ... end end >> '日本語'.contains_cjk? => true >> '광고 프로그램'.contains_cjk? => true >> '艾弗森将退出篮坛'.contains_cjk? => true >> 'Watashi ha bakana gaijin desu.'.contains_cjk? => false I suspect that this will boil down to seeing

How to keep the Chinese or other foreign language as they are instead of converting them into codes?

旧时模样 提交于 2019-11-26 22:26:49
问题 DOMDocument seems to convert Chinese characters into codes, for instance, 你的乱发 will become ä½ çš„ä¹±å‘ How can I keep the Chinese or other foreign language as they are instead of converting them into codes? Below is my simple test, $dom = new DOMDocument(); $dom->loadHTML($html); If I add this below before loadHTML(), $html = mb_convert_encoding($html, "HTML-ENTITIES", "UTF-8"); I get, 你的乱发 Even though the coverted codes will be displayed as Chinese characters, 你的乱发 still are not 你的乱发 what I