unicode-string | 易学教程

UnicodeEncodeError with xlrd

阅读更多关于 UnicodeEncodeError with xlrd

问题 I'm trying to read a .xlsx with xlrd. I have everything set up and working. It works for data with normal English letters as well as numbers. However when it gets to Swedish letters (ÄÖÅ) it gives me this error: print str(sheet.cell_value(1, 2)) + " " + str(sheet.cell_value(1, 3)) + " " + str(sheet.cell_value(1, 4)) + " " + str(sheet.cell_value(1, 5)) UnicodeEncodeError: 'ascii' codec can't encode character u'\xd6' in position 1: ordinal not in range(128) My code: # -*- coding: cp1252 -*-

Unable to change an SSIS Excel Destination Column Data Type

阅读更多关于 Unable to change an SSIS Excel Destination Column Data Type

问题 I have an SSIS package that is importing data from SQL Server and placing it into an Excel destination file. When going into the Advanced Editor of the ADO Source component, I have a field Description that has an External Data Type of Unicode String , length 4000, and an Output Data Type of Unicode Text Stream (This is to ensure a String length > 255 can be imported into Excel). Now, when I go into the Advanced Editor of the Excel Destination component the Data Type is stuck as Unicode String

linking error: undefined reference to icu_50::UnicodeString::UnicodeString()

阅读更多关于 linking error: undefined reference to icu_50::UnicodeString::UnicodeString()

问题 I am trying to compile my project where I've declared as class members some: icu::UnicodeString label; icu::UnicodeString tags; icu::UnicodeString domain; icu::UnicodeString data; After having included (yes it is found) #include <unicode/unistr.h> In my CMakeLists.txt it searches, finds and links with: icuuc icudata (libicuuc, libicudata) as the output suggests prior to throwing the errors: -o icarus -rdynamic -lPocoNet -lPocoUtil -lPocoXML -licuuc -licudata I have built and installed from

Is it advisable to use strcmp or _tcscmp for comparing strings in Unicode versions?

阅读更多关于 Is it advisable to use strcmp or _tcscmp for comparing strings in Unicode versions?

问题 Is it advisable to use strcmp or _tcscmp for comparing strings in Unicode versions? 回答1: _tcscmp() is a macro. If you define UNICODE it will use wcscmp() , otherwise it will use strcmp() . Note the types TCHAR , PTSTR , etc. are similar. They will be WCHAR and PWSTR if you define UNICODE , and CHAR and PSTR otherwise. 回答2: No, you should use _ tcscmp . That will resolve to proper function depending upon on your compiler flags. 来源： https://stackoverflow.com/questions/2107103/is-it-advisable-to

StrRev() Dosent Support UTF-8

阅读更多关于 StrRev() Dosent Support UTF-8

问题 I'm trying to make a code that replace Arabic text to be supported in non Arabic supported programs in that i will be need to reverse the text after replace but its shows some garbage stuff instead of the wanted result Here Is The Code : <?php $string = "اهلا بك"; echo "$string "; $Reversed = strrev($string); echo " After Reverse "; echo " $Reversed"; ?> Result : اهلا بك After Reverse �٨� �؄ه٧ I need it to be the way it is but reversed ? not GARBAGE !! 回答1: in order

Least used unicode delimiter

阅读更多关于 Least used unicode delimiter

问题 I'm trying to tag my text with a delimiter at specific places that will be used later for parsing. I want to use a delimiter character that is least frequently used. I'm currently looking at the "\2" or the U+0002 character. Is that safe enough to use? What other suggestions are there? The text is unicode and will have both english and non-english characters. A want to use a character that can still be "exploded()" by PHP. Edit: Also I want to be able to display this piece of text on screen

Converting character offsets into byte offsets (in Python)

阅读更多关于 Converting character offsets into byte offsets (in Python)

问题 Suppose I have a bunch of files in UTF-8 that I send to an external API in unicode. The API operates on each unicode string and returns a list with (character_offset, substr) tuples. The output I need is the begin and end byte offset for each found substring. If I'm lucky the input text contains only ASCII characters (making character offset and byte offset identical), but this is not always the case. How can I find the begin and end byte offsets for a known begin character offset and

Replacing a specific Unicode Character in MS SQL Server

阅读更多关于 Replacing a specific Unicode Character in MS SQL Server

问题 I'm using MS SQL Server Express 2012. I'm having trouble removing the unicode character U+02CC (Decimal : 716) in the grid results. The original text is 'λeˌβár'. I tried it like this, it doesn't work: SELECT ColumnTextWithUnicode, REPLACE(ColumnTextWithUnicode , 'ˌ','') FROM TableName The column has Latin1_General_CI_AS collation and datatype is nvarchar. I tried changing the collation to something binary, but no success as well: SELECT ColumnTextWithUnicode, REPLACE(ColumnTextWithUnicode

Converting Unicode objects with non-ASCII symbols in them into strings objects (in Python)

阅读更多关于 Converting Unicode objects with non-ASCII symbols in them into strings objects (in Python)

问题 I want to send Chinese characters to be translated by an online service, and have the resulting English string returned. I'm using simple JSON and urllib for this. And yes, I am declaring. # -*- coding: utf-8 -*- on top of my code. Now everything works fine if I feed urllib a string type object, even if that object contains what would be Unicode information. My function is called translate . For example: stringtest1 = '無與倫比的美麗' print translate(stringtest1) results in the proper translation

Python escape sequence \N{name} not working as per definition

阅读更多关于 Python escape sequence \N{name} not working as per definition

问题 I am trying to print unicode characters given their name as follows: # -*- coding: utf-8 -*- print "\N{SOLIDUS}" print "\N{BLACK SPADE SUIT}" However the output I get is not very encouraging. The escape sequence is printed as is. ActivePython 2.7.2.5 (ActiveState Software Inc.) based on Python 2.7.2 (default, Jun 24 2011, 12:21:10) [MSC v.1500 32 bit (Intel)] on Type "help", "copyright", "credits" or "license" for more information. >>> # -*- coding: utf-8 -*- ... print "\N{SOLIDUS}" \N