unicode

Escaping regex unicode string in Python

孤者浪人 提交于 2021-02-04 21:23:18
问题 I have a user defined string. I want to use it in regex with small improvement: search by three apostrophes instead of one. For example, APOSTROPHES = re.escape('\'\u2019\u02bc') word = re.escape("п'ять") word = ''.join([s if s not in APOSTROPHES else '[%s]' % APOSTROPHES for s in word]) It works good for latin, but for unicode list comprehension gives the following string: "[\\'\\\\u2019\\\\u02bc]\xd0[\\'\\\\u2019\\\\u02bc]\xbf[\\'\\\\u2019\\\\u02bc][\\'\\\\u2019\\\\u02bc][\\'\\\\u2019\\\

Escaping regex unicode string in Python

守給你的承諾、 提交于 2021-02-04 21:22:15
问题 I have a user defined string. I want to use it in regex with small improvement: search by three apostrophes instead of one. For example, APOSTROPHES = re.escape('\'\u2019\u02bc') word = re.escape("п'ять") word = ''.join([s if s not in APOSTROPHES else '[%s]' % APOSTROPHES for s in word]) It works good for latin, but for unicode list comprehension gives the following string: "[\\'\\\\u2019\\\\u02bc]\xd0[\\'\\\\u2019\\\\u02bc]\xbf[\\'\\\\u2019\\\\u02bc][\\'\\\\u2019\\\\u02bc][\\'\\\\u2019\\\

Escaping regex unicode string in Python

我是研究僧i 提交于 2021-02-04 21:22:02
问题 I have a user defined string. I want to use it in regex with small improvement: search by three apostrophes instead of one. For example, APOSTROPHES = re.escape('\'\u2019\u02bc') word = re.escape("п'ять") word = ''.join([s if s not in APOSTROPHES else '[%s]' % APOSTROPHES for s in word]) It works good for latin, but for unicode list comprehension gives the following string: "[\\'\\\\u2019\\\\u02bc]\xd0[\\'\\\\u2019\\\\u02bc]\xbf[\\'\\\\u2019\\\\u02bc][\\'\\\\u2019\\\\u02bc][\\'\\\\u2019\\\

Perl regular expression matching on large Unicode code points

折月煮酒 提交于 2021-02-04 18:16:26
问题 I am trying to replace various characters with either a single quote or double quote. Here is my test file: # Replace all with double quotes " fullwidth “ left ” right „ low " normal # Replace all with single quotes ' normal ‘ left ’ right ‚ low ‛ reverse ` backtick I'm trying to do this... perl -Mutf8 -pi -e "s/[\x{2018}\x{201A}\x{201B}\x{FF07}\x{2019}\x{60}]/'/ug" test.txt perl -Mutf8 -pi -e 's/[\x{FF02}\x{201C}\x{201D}\x{201E}]/"/ug' text.txt But only the backtick character gets replaced

Displaying other language characters in PyQt

可紊 提交于 2021-02-04 16:47:43
问题 Is there a way to display other language characters in PyQt4? and if there is, what's the approach/direction that I should take? Thanks in advance. 回答1: Qt uses Unicode and should be able to display (Unicode) text in any language you have a suitable font for. For example, Roberto Alesina's simple "Hello World" program on the PyQt Wiki -- which I transcribe for readability (and w/o the comments for brevity) since it's pretty unreadable in the wiki -- should let you use as the button's text any

Displaying other language characters in PyQt

自古美人都是妖i 提交于 2021-02-04 16:47:19
问题 Is there a way to display other language characters in PyQt4? and if there is, what's the approach/direction that I should take? Thanks in advance. 回答1: Qt uses Unicode and should be able to display (Unicode) text in any language you have a suitable font for. For example, Roberto Alesina's simple "Hello World" program on the PyQt Wiki -- which I transcribe for readability (and w/o the comments for brevity) since it's pretty unreadable in the wiki -- should let you use as the button's text any

How do I convert HTML percent-encoding to Unicode, with XSLT?

吃可爱长大的小学妹 提交于 2021-02-04 14:07:53
问题 There are tons of entries and answers online about this, but they're all going the opposite direction of what I need. From my iTunes XML, I have thousands of percent-encoded entries, in multiple languages, that I'm trying to convert, with an XSLT stylesheet, to Unicode text. Is there any function or process that I'm missing, other than tracking down every single character and doing a replace? Here is a small sample of some examples of the variety that I'm working with, the first line is the

How do I convert HTML percent-encoding to Unicode, with XSLT?

隐身守侯 提交于 2021-02-04 14:06:27
问题 There are tons of entries and answers online about this, but they're all going the opposite direction of what I need. From my iTunes XML, I have thousands of percent-encoded entries, in multiple languages, that I'm trying to convert, with an XSLT stylesheet, to Unicode text. Is there any function or process that I'm missing, other than tracking down every single character and doing a replace? Here is a small sample of some examples of the variety that I'm working with, the first line is the

Using Chinese to build a dictionary in Python

我是研究僧i 提交于 2021-02-04 13:53:29
问题 so this is my first time here, and also I am new to the world of Python. I am studying Chinese also and I wanted to create a program to review Chinese vocabulary using a dictionary. Here is the code that I normally use: #!/usr/bin/python # -*- coding:utf-8-*- dictionary = {"Hello" : "你好"} # Simple example to save time print(dictionary) The results I keep getting are something like: {'hello': '\xe4\xbd\xa0\xe5\xa5\xbd'} I have also trying adding a "u" to the beginning of the string with the

Visually-identical characters in Unicode

微笑、不失礼 提交于 2021-02-04 13:44:25
问题 I want to find visually identical characters for a specific character in Unicode. I know how to find canonical or compatibility decompositions of a character; but they do not give me what I want. I want to find characters that are visually identical (not similar), and their only difference can be their sizes. for example I want : (s,S), or (S,S) (whose code points are different). I do not want (ß, β), or (e, é). Any suggestions? Thanks. 回答1: For a particular character, you could start from