unicode

Why C99 has such an odd restriction for universal character names?

孤人 提交于 2021-02-19 04:26:45
问题 6.4.3 Universal character names A universal character name shall not specify a character whose short identifier is less than 00A0 other than 0024 ($), 0040 (@), or 0060 (`), nor one in the range D800 through DFFF inclusive. Besides the fact that it is no longer "universal" with restrictions like this, I can't think of good reasons for such a restriction. Anyone knows the backstory? 回答1: D800 through DFFF inclusive are not valid code points; they are high and low surrogates, which can only be

How do I match only fully-composed characters in a Unicode string in Perl?

怎甘沉沦 提交于 2021-02-18 22:10:27
问题 I'm looking for a way to match only fully composed characters in a Unicode string. Is [:print:] dependent upon locale in any regular expression implementation that incorporates this character class? For example, will it match Japanese character 'あ', since it is not a control character, or is [:print:] always going to be ASCII codes 0x20 to 0x7E? Is there any character class, including Perl REs, that can be used to match anything other than a control character? If [:print:] includes only

MS Excel Vba Arabic unicode

北城余情 提交于 2021-02-18 19:42:38
问题 I have a text file and a macro enabled excel file. the excel file gets (using vba) the string (arabic text) from the text file per line then put it on the sheet1 cells. The problem is the string is not properly displayed. It is displayed in random Japanese characters. (My windows locale is Japan). Here is my code: Open FilePath For Inputs As #1 Do Until EOF(1) Line Input #1, textline ActiveWorkbook.sheets(1).Cell(1,1).Value = textline 'MsgBox(textline) Loop Close#1 Question: How can I get the

UTF8, codepoints, and their representation in Erlang and Elixir

时光毁灭记忆、已成空白 提交于 2021-02-18 19:13:07
问题 going through Elixir's handling of unicode: iex> String.codepoints("abc§") ["a", "b", "c", "§"] very good, and byte_size/2 of this is not 4 but 5, because the last char is taking 2 bytes, I get that. The ? operator (or is it a macro? can't find the answer) tells me that iex(69)> ?§ 167 Great; so then I look into the UTF-8 encoding table, and see value c2 a7 as hex encoding for the char. That means the two bytes (as witnessed by byte_size/1) are c2 (94 in decimal) and a7 (167 in decimal). That

django python collation error

两盒软妹~` 提交于 2021-02-18 12:08:47
问题 What is the reason for the following error? when i try to filter with: if MyObject.objects.filter(location = aDictionary['address']): where location is defined as: location = models.CharField(max_length=100, blank=True, default='') I get the following error when aDictionary['address'] contains a string with a non-alphanumeric character (for example Kīhei): File "/usr/lib/pymodules/python2.6/MySQLdb/connections.py", line 35, in defaul terrorhandler raise errorclass, errorvalue _mysql

What is the Best UTF [closed]

送分小仙女□ 提交于 2021-02-18 10:26:06
问题 As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance. Closed 8 years ago . I'm really confused about UTF in Unicode. there is UTF-8, UTF-16 and UTF-32. my question is : what UTF that are support all Unicode

Colored diacritics and unicode behaviour

吃可爱长大的小学妹 提交于 2021-02-18 07:52:12
问题 I just stumbled over this question about coloring diacritics. The task was to color diacritics in another color than the base text, like in á presenting a in blue and ´ in red. I thought I could give it a try, separating letter and diacritic through unicode combining marks, and applying another color to the diacritics by putting a span around it, like this: <p> p<span>̄ </span> o<span>̄ </span> m<span>̃ </span> o<span>̃ </span> d<span>̈ </span> o<span>̈ </span> r<span>̌ </span> o<span>̌ <

Colored diacritics and unicode behaviour

怎甘沉沦 提交于 2021-02-18 07:52:11
问题 I just stumbled over this question about coloring diacritics. The task was to color diacritics in another color than the base text, like in á presenting a in blue and ´ in red. I thought I could give it a try, separating letter and diacritic through unicode combining marks, and applying another color to the diacritics by putting a span around it, like this: <p> p<span>̄ </span> o<span>̄ </span> m<span>̃ </span> o<span>̃ </span> d<span>̈ </span> o<span>̈ </span> r<span>̌ </span> o<span>̌ <

Using Beautiful Soup with accents and different characters

最后都变了- 提交于 2021-02-18 07:44:11
问题 I'm using Beautiful Soup to pull medal winners from past Olympics. It's tripping over the use of accents in some of the events and athlete names. I've seen similar problems posted online but I'm new to Python and having trouble applying them to my code. If I print my soup, the accents appear fine. but when I start parsing the soup (and write it to a CSV file) the accented characters become garbled. 'Louis Perrée' becomes 'Louis Perr√©e' from BeautifulSoup import BeautifulSoup import urllib2

ggsave losing unicode characters from ggplot+gridExtra

血红的双手。 提交于 2021-02-18 05:16:48
问题 More code than you really need, but to set the mood: #Make some data and load packages data<-data.frame(pchange=runif(80,0,1),group=factor(sample(c(1,2,3),80,replace=T))) library(dplyr) library(magrittr) library(gridExtra) library(ggplot2) data%<>%arrange(group,pchange) %>% mutate(num=1:80) #Make plot that includes unicode characters g1<-ggplot(data, aes(factor(num),pchange, fill = group,width=.4)) + geom_bar(stat="identity", position = "dodge") + theme_classic()+ theme(axis.ticks = element