unicode | 易学教程

UnicodeDecodeError when using socket.gethostname() result

阅读更多关于 UnicodeDecodeError when using socket.gethostname() result

问题 Some of my users report that the following code may raise a UnicodeDecodeError when the hostname contains non-ascii characters (however I haven't been able to replicate this on my Windows Vista machine): self.path = path self.lock_file = os.path.abspath(path) + ".lock" self.hostname = socket.gethostname() self.pid = os.getpid() dirname = os.path.dirname(self.lock_file) self.unique_name = os.path.join(dirname, "%s.%s" % (self.hostname, self.pid)) The last part of the traceback is: File

Karate - How to deal with unicode characters?

阅读更多关于 Karate - How to deal with unicode characters?

问题 I want to send a Unicode string as a request parameter like this: {"mobile": "۹۸.۹۱۲۳۴۳۰۴۱۲"} but Karate send it like this instead: {"mobile": "??.??????????"} I've tried to read Unicode text from a file contains my text: ۹۸.۹۱۲۳۴۳۰۴۱۲ then read and send it this way: * def persianMobile1 = read('classpath:account/unicode/persian.mobile.txt') Given url karate.get('urlBase') + "account/activateMobileByVerificationCode" And request """ { "mobile":#(persianMobile1), "code":#

Dumping unicode with YAML

阅读更多关于 Dumping unicode with YAML

问题 I'm creating yaml files from csv's that have a lot of unicode characters in them but I can't seem to get it to dump the unicode without it giving me a Decode Error. I'm using the ruamel.yaml library. UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in position 11: ordinal not in range(128) I've tried parsing strings, unicode strings, encoding with "utf-8" nothing seems to work. I've seen a lot of examples that show adding a representer to solve the issue but they all seem to be using

utf-8 是unicode的一种实现方式

阅读更多关于 utf-8 是unicode的一种实现方式

为了统一，发明了unicode，将世界上所有的符号都纳入其中，每一个符号都给予一个独一无二的编码，现在unicode可以容纳100多万个符号，所有语言都可以互通，一个网页页面里可以同时显示各国文字。但没有规定如何存储。这样导致一个后果：出现了Unicode的多种存储方式。 UTF-8以字节为单位对Unicode进行编码。从Unicode到UTF-8的编码方式如下： Unicode编码(十六进制)　 UTF-8 字节流(二进制) 000000-00007F 0xxxxxxx 000080-0007FF 110xxxxx 10xxxxxx 000800-00FFFF 1110xxxx 10xxxxxx 10xxxxxx 010000-10FFFF 11110xxx 10xxxxxx 10xxxxxx 10xxxxxx 中文在utf-8中占3or4个字节，utf8字符是变长字符，在这里补充一下uft-8的编码方式。一字节：0******* 两字节：110*****，10****** 三字节：1110****，10******，10****** 四字节：11110***，10******，10******，10****** 五字节：111110**，10******，10******，10******，10****** 六字节：1111110*，10******，10******，10*

Windows XP中的多语言支持

阅读更多关于 Windows XP中的多语言支持

早期使用英文版Windows 95和Windows 98的时候，为了英文版的操作系统能正确显示和处理中文内容，我们必须在系统中安装一些专用的中文平台，例如中文之星、RichWin之类的。现在在Windows XP下就不用这么麻烦了，因为该操作系统已经直接可以支持多国语言，只是需要一些设置。中文的正确显示首先要解决的就是中文的正确显示。打开控制面板，点击“Data, Time, Language , and Regional Options”，然后再点击“Regional and Language Options”，会弹出一个窗口，打开这个窗口的“Languages”选项卡，选中“Install files for East Asian languages（安装东亚语言支持）”（图一），然后点击“Apply”。这时系统可能会要求你插入操作系统的安装光盘并复制一些文件，完成后会重启动。启动好，所有的中文文件和中文网页就都可以正常打开和使用了。网上很多人说，对于英文版的Windows XP，只要安装中文语言包就可以正确显示中文，其实这是不对的。中文语言包只能让英文版操作系统的界面由英文变成中文，并且要安装中文语言包，也要先装东亚语言支持；其次，中文语言包只能给英文版Windows XP Professional 安装，Home版并不能装语言包；最后，多国语言包不出售和提供下载

Python3 UnicodeEncodeError when run via Synology task scheduler

阅读更多关于 Python3 UnicodeEncodeError when run via Synology task scheduler

问题 I get a Python3 UnicodeEncodeError when I run my script via the Synology task scheduler. I do not get this error when I run the script via the commandline (using PuTTY). Why is this and how can I solve it? Simple test script: import sys print (sys.version) # to confirm the correct Python version print("Fichier non trouvé♠ #M–Nein") # to test non ascii characters test = "Fichier non trouvé♠ #M–Nein" print ("test is " + test) test2 = str(test) # to test if the string function causes and issue

Converting to Emoji

阅读更多关于 Converting to Emoji

问题 so I am trying to take this data that uses unicode indicators and make it print with emojis. It is currently in a txt. file but I will write to an excel file later. So anyways I am getting an error I am not sure what to do with. This is the text I am reading: "Thanks @UglyGod \ud83d\ude4f https:\\/\\/t.co\\/8zVVNtv1o6\" "RT @Rosssen: Multiculti beatdown \ud83d\ude4f https:\\/\\/t.co\\/fhwVkjhFFC\" And here is my code: sampleFile= open('tweets.txt', 'r').read() splitFile=sampleFile.split('\n')

utf-8 and ActionMailer

阅读更多关于 utf-8 and ActionMailer

问题 I'm facing some problems with UTF-8 and ActionMailer. My application has a form (contact) that when it is submitted, it sends an email to me. The problem is that when somebody enters some chars like öäüß, I receive the message encoded like for example =?UTF-8?Q?funktioniert_oder_nicht.=0D=0A=0D=0Ameine_Stra=C3=9Fe_ist_die?= =?UTF-8?Q?_Bratwurststra=C3=9Fe=0D=0A=0D=0A=C3=B6=C3=A4?= As I understand, ActionMailer per default is utf-8 ready. Analyzing the log from my server, when the form is

Why does this regex return true?

阅读更多关于 Why does this regex return true?

问题 Why does this regex return true? Regex.IsMatch("العسكرية", "العسكري") I googled and nothing came up. 回答1: I suspect what you posted is actually reversed, where the shorter text is in fact the pattern, and the longer input is the input being matched against. In that case, this would return true since the pattern matches everything but the last letter in the word. To clarify, العسكري is the pattern, and العسكرية is the input. Since I know Arabic I can tell you that the latter would indeed be a

Why does this regex return true?

阅读更多关于 Why does this regex return true?