unicode | 易学教程

How to encode 32-bit Unicode characters in a PowerShell string literal?

阅读更多关于 How to encode 32-bit Unicode characters in a PowerShell string literal?

问题 This Stack Overflow question deals with 16-bit Unicode characters. I would like a similar solution that supports 32-bit characters. See this link for a listing of the various Unicode charts. For example, a range of characters that are 32-bit are the Musical Symbols. The answer in the question linked above doesn't work because it casts the System.Int32 value as a System.Char, which is a 16-bit type. Edit: Let me clarify that I don't particularly care about displaying the 32-bit Unicode

Undocumented Java regex character class: \p{C}

阅读更多关于 Undocumented Java regex character class: \p{C}

问题 I found an interesting regex in a Java project: "[\\p{C}&&\\S]" I understand that the && means "set intersection", and \S is "non-whitespace", but what is \p{C} , and is it okay to use? The java.util.regex.Pattern documentation doesn't mention it. The only similar class on the list is \p{Cntrl} , but they behave differently: they both match on control characters, but \p{C} matches twice on Unicode characters above U+FFFF, such as PILE OF POO : public class StrangePattern { public static void

Identify if a Unicode code point represents a character from a certain script such as the Latin script?

阅读更多关于 Identify if a Unicode code point represents a character from a certain script such as the Latin script?

问题 Unicode categorizes characters as belonging to a script, such as the Latin script. How do I test whether a particular character (code point) is in a particular script? 回答1: Java represents the various Unicode scripts in the Character.UnicodeScript enum, including for example Character.UnicodeScript.LATIN. These match the Unicode Script Properties. You can test a character by submitting its code point integer number to the of method on that enum. int codePoint = "a".codePointAt( 0 ) ;

Combining accent and character into one character in java 7

阅读更多关于 Combining accent and character into one character in java 7

问题 I am trying to write a java code that returns a single character combining both a character and an accent. The actual result of combining is a string and not one single character. The following is a simple method to illustrate what I am trying to do. Thank you private char convert (char c) { if (c == '\u0130') { return '\u0069 \u0307'; // If the return value is String I get i. } //I need small i double dot else return c; } 回答1: Normalizer can decompose/compose your character as you like:

How to replace all unicode characters except for Spanish ones?

阅读更多关于 How to replace all unicode characters except for Spanish ones?

问题 I am trying to remove all Unicode characters from a file except for the Spanish characters. Matching the different vowels has not been any issue and áéíóúÁÉÍÓÚ are not replaced using the following regex (but all other Unicode appears to be replaced): perl -pe 's/[^áéíóúÁÉÍÓÚ[:ascii:]]//g;' filename But when I add the inverted question mark ¿ or exclamation mark ¡ to the regex other Unicode characters are also being matched and excluded that I would like to be removed: perl -pe 's/[^áéíóúÁÉÍÓÚ

Python, how to print Japanese, Korean, Chinese strings

阅读更多关于 Python, how to print Japanese, Korean, Chinese strings

问题 In Python, for Japanese, Chinese, and Korean,Python can not print the correct strings, for example hello in Japanese, Korean and Chinese are: こんにちは 안녕하세요 你好 And print these strings: In [1]: f = open('test.txt') In [2]: for _line in f.readlines(): ...: print(_line) ...: こんにちは 안녕하세요 你好 In [3]: f = open('test.txt') In [4]: print(f.readlines()) [ '\xe3\x81\x93\xe3\x82\x93\xe3\x81\xab\xe3\x81\xa1\xe3\x81\xaf\n', '\xec\x95\x88\xeb\x85\x95\xed\x95\x98\xec\x84\xb8\xec\x9a\x94\n', '\xe4\xbd\xa0\xe5

Python 3: CSV files and Unicode Error

阅读更多关于 Python 3: CSV files and Unicode Error

问题 I have a csv (tsv) file with this header "Message Name" "Field" "Base Label" "Base Label Update Date" "Translated Label" "Translated Label Update Date" "Language" "Message" "subject_template" "New Task: Assess Distribution Outcomes for ""${docNameNoLink}"", ""${docNumber}""" "8/10/16 4:17:43 PM" "Nouvelle tâche : évaluez le résultat de la distribution de « ${docNameNoLink} »." "2/17/14 5:09:10 AM" "fr" When I try to read the file with this code import csv with open(fileName, 'r', encoding=

Import all letters of an alphabet in a certain language in python

阅读更多关于 Import all letters of an alphabet in a certain language in python

问题 Could it be possible to import all the possible letters (lowercase, uppercase, etc.) in an alphabet in a certain language (Turkish, Polish, Russian, etc.) as a python list? Is there a certain module to do that? Thanks & Best Regards Michael 回答1: Your question ties into a larger problem - how alphabets of certain languages are stored in a computer, how they are represented, and (eventually) how they can be retrieved in Python? I suggest you read: The Absolute Minimum Every Software Developer

Is there a way to replace ′ (prime) in a string using str_replace_all?

阅读更多关于 Is there a way to replace ′ (prime) in a string using str_replace_all?

问题 I'm trying to format various coordinates in degrees/minutes and degrees/minutes/seconds prior to passing through measurements::conv_unit(), which requires the input as numbers separated by spaces. I've read various answers to similar questions, such as this one: Remove all special characters from a string in R? Which lead me to initially try: library(tidyverse) latitude <- "-36°48′31.33" str_replace_all(string = latitude, pattern = c("°|'|\"|′|″"), repl = " ") However, the prime symbol (′) is

Is there a way to replace ′ (prime) in a string using str_replace_all?

阅读更多关于 Is there a way to replace ′ (prime) in a string using str_replace_all?