unicode | 易学教程

how to get rid of `Wide character in print at`?

阅读更多关于 how to get rid of `Wide character in print at`?

问题 I have file /tmp/xxx with next content: 00000000 D0 BA D0 B8 │ D1 80 D0 B8 │ D0 BB D0 B8 │ D0 BA к и р и л и к When I read content of file and print it I get the error: Wide character in print at ... The source is: use utf8; open my $fh, '<:encoding(UTF-8)', '/tmp/xxx'; print scalar <$fh> The output from print is: кирилик 回答1: You're printing to STDOUT which isn't expecting UTF8. Add binmode(STDOUT, "encoding(UTF-8)"); to change that on the already opened handle. 回答2: The use utf8 means Perl

how to get rid of `Wide character in print at`?

阅读更多关于 how to get rid of `Wide character in print at`?

How to find accented characters in a string in Python?

阅读更多关于 How to find accented characters in a string in Python?

问题 I have a file with sentences, some of which are in Spanish and contain accented letters (e.g. é) or special characters (e.g. ¿). I have to be able to search for these characters in the sentence so I can determine if the sentence is in Spanish or English. I've tried my best to accomplish this, but have had no luck in getting it right. Below is one of the solutions I tried, but clearly gave the wrong answer. sentence = ¿Qué tipo es el? #in str format, received from standard open file method

How to encode Python 3 string using \u escape code?

阅读更多关于 How to encode Python 3 string using \u escape code?

问题 In Python 3, suppose I have >>> thai_string = 'สีเ' Using encode gives >>> thai_string.encode('utf-8') b'\xe0\xb8\xaa\xe0\xb8\xb5' My question: how can I get encode() to return a bytes sequence using \u instead of \x ? And how can I decode them back to a Python 3 str type? I tried using the ascii builtin, which gives >>> ascii(thai_string) "'\\u0e2a\\u0e35'" But this doesn't seem quite right, as I can't decode it back to obtain thai_string . Python documentation tells me that \xhh escapes the

Reading files with a BOM in Go

阅读更多关于 Reading files with a BOM in Go

问题 I need to read Unicode files that may or may not contain a byte-order mark. I could of course check the first few bytes of the file myself, and discard a BOM if I find one. But before I do, is there any standard way of doing this, either in the core libraries or a third party? 回答1: No standard way, IIRC (and the standard library would really be a wrong layer to implement such a check in) so here are two examples of how you could deal with it yourself. One is to use a buffered reader above

Reading files with a BOM in Go

阅读更多关于 Reading files with a BOM in Go

Unicode support for Invoke-Sqlcmd in PowerShell

阅读更多关于 Unicode support for Invoke-Sqlcmd in PowerShell

问题 The PowerShell sqlps module provides core support for SQL Server access from within PowerShell and its Invoke-Sqlcmd cmdlet is its main workhorse for executing literal queries or SQL script files (analogous to the non-PowerShell sqlcmd utility). I recently tried some experiments to confirm that Invoke-Sqlcmd handles Unicode and had some surprising results. I started with this simple script file (named unicode.sql): CREATE TABLE #customers ( [IdCust] int, [FirstName] nvarchar(25), [SurName]

How to save a UTF-16 with BOM file with Inno Setup

阅读更多关于 How to save a UTF-16 with BOM file with Inno Setup

问题 How to save a string to a text file with UTF-16 (UCS-2) encoding with BOM? The SaveStringsToUTF8File saves as UTF-8. Using streams saves it as ANSI. var i:integer; begin for i := 1 to length(aString) do begin Stream.write(aString[i],1); Stream.write(#0,1); end; stream.free; end; 回答1: As the Unicode string (in the Unicode version of Inno Setup – the only version as of Inno Setup 6) actually uses the UTF-16 LE encoding, all you need to do is to copy the (Unicode) string to a byte array (

How to save a UTF-16 with BOM file with Inno Setup

阅读更多关于 How to save a UTF-16 with BOM file with Inno Setup

How to print C++ wstring UTF-8 characters to Mac OS or Unix terminal?

阅读更多关于 How to print C++ wstring UTF-8 characters to Mac OS or Unix terminal?

问题 How can I print a std::wstring using std::wcout ? I tried the following, which was recommended here, but it works only for printing this ¡Hola! but not this 日本 : #include <iostream> #include <clocale> int main(int argc, char* argv[]) { char* locale = setlocale(LC_ALL, ""); std::cout << "locale: " << locale << std::endl; // "C" for me std::locale lollocale(locale); setlocale(LC_ALL, locale); std::wcout.imbue(lollocale); std::wcout << L"¡Hola!" << std::endl; // ok std::wcout << L"日本" << std: