unicode

how to get rid of `Wide character in print at`?

北慕城南 提交于 2021-02-07 13:24:59
问题 I have file /tmp/xxx with next content: 00000000 D0 BA D0 B8 │ D1 80 D0 B8 │ D0 BB D0 B8 │ D0 BA к и р и л и к When I read content of file and print it I get the error: Wide character in print at ... The source is: use utf8; open my $fh, '<:encoding(UTF-8)', '/tmp/xxx'; print scalar <$fh> The output from print is: кирилик 回答1: You're printing to STDOUT which isn't expecting UTF8. Add binmode(STDOUT, "encoding(UTF-8)"); to change that on the already opened handle. 回答2: The use utf8 means Perl

how to get rid of `Wide character in print at`?

混江龙づ霸主 提交于 2021-02-07 13:24:32
问题 I have file /tmp/xxx with next content: 00000000 D0 BA D0 B8 │ D1 80 D0 B8 │ D0 BB D0 B8 │ D0 BA к и р и л и к When I read content of file and print it I get the error: Wide character in print at ... The source is: use utf8; open my $fh, '<:encoding(UTF-8)', '/tmp/xxx'; print scalar <$fh> The output from print is: кирилик 回答1: You're printing to STDOUT which isn't expecting UTF8. Add binmode(STDOUT, "encoding(UTF-8)"); to change that on the already opened handle. 回答2: The use utf8 means Perl

How to find accented characters in a string in Python?

佐手、 提交于 2021-02-07 12:56:52
问题 I have a file with sentences, some of which are in Spanish and contain accented letters (e.g. é) or special characters (e.g. ¿). I have to be able to search for these characters in the sentence so I can determine if the sentence is in Spanish or English. I've tried my best to accomplish this, but have had no luck in getting it right. Below is one of the solutions I tried, but clearly gave the wrong answer. sentence = ¿Qué tipo es el? #in str format, received from standard open file method

How to encode Python 3 string using \u escape code?

被刻印的时光 ゝ 提交于 2021-02-07 12:38:48
问题 In Python 3, suppose I have >>> thai_string = 'สีเ' Using encode gives >>> thai_string.encode('utf-8') b'\xe0\xb8\xaa\xe0\xb8\xb5' My question: how can I get encode() to return a bytes sequence using \u instead of \x ? And how can I decode them back to a Python 3 str type? I tried using the ascii builtin, which gives >>> ascii(thai_string) "'\\u0e2a\\u0e35'" But this doesn't seem quite right, as I can't decode it back to obtain thai_string . Python documentation tells me that \xhh escapes the

Reading files with a BOM in Go

五迷三道 提交于 2021-02-07 12:32:51
问题 I need to read Unicode files that may or may not contain a byte-order mark. I could of course check the first few bytes of the file myself, and discard a BOM if I find one. But before I do, is there any standard way of doing this, either in the core libraries or a third party? 回答1: No standard way, IIRC (and the standard library would really be a wrong layer to implement such a check in) so here are two examples of how you could deal with it yourself. One is to use a buffered reader above

Reading files with a BOM in Go

霸气de小男生 提交于 2021-02-07 12:32:09
问题 I need to read Unicode files that may or may not contain a byte-order mark. I could of course check the first few bytes of the file myself, and discard a BOM if I find one. But before I do, is there any standard way of doing this, either in the core libraries or a third party? 回答1: No standard way, IIRC (and the standard library would really be a wrong layer to implement such a check in) so here are two examples of how you could deal with it yourself. One is to use a buffered reader above

Unicode support for Invoke-Sqlcmd in PowerShell

别等时光非礼了梦想. 提交于 2021-02-07 11:58:37
问题 The PowerShell sqlps module provides core support for SQL Server access from within PowerShell and its Invoke-Sqlcmd cmdlet is its main workhorse for executing literal queries or SQL script files (analogous to the non-PowerShell sqlcmd utility). I recently tried some experiments to confirm that Invoke-Sqlcmd handles Unicode and had some surprising results. I started with this simple script file (named unicode.sql): CREATE TABLE #customers ( [IdCust] int, [FirstName] nvarchar(25), [SurName]

How to save a UTF-16 with BOM file with Inno Setup

天大地大妈咪最大 提交于 2021-02-07 10:28:58
问题 How to save a string to a text file with UTF-16 (UCS-2) encoding with BOM? The SaveStringsToUTF8File saves as UTF-8. Using streams saves it as ANSI. var i:integer; begin for i := 1 to length(aString) do begin Stream.write(aString[i],1); Stream.write(#0,1); end; stream.free; end; 回答1: As the Unicode string (in the Unicode version of Inno Setup – the only version as of Inno Setup 6) actually uses the UTF-16 LE encoding, all you need to do is to copy the (Unicode) string to a byte array (

How to save a UTF-16 with BOM file with Inno Setup

不问归期 提交于 2021-02-07 10:27:03
问题 How to save a string to a text file with UTF-16 (UCS-2) encoding with BOM? The SaveStringsToUTF8File saves as UTF-8. Using streams saves it as ANSI. var i:integer; begin for i := 1 to length(aString) do begin Stream.write(aString[i],1); Stream.write(#0,1); end; stream.free; end; 回答1: As the Unicode string (in the Unicode version of Inno Setup – the only version as of Inno Setup 6) actually uses the UTF-16 LE encoding, all you need to do is to copy the (Unicode) string to a byte array (

How to print C++ wstring UTF-8 characters to Mac OS or Unix terminal?

走远了吗. 提交于 2021-02-07 10:24:37
问题 How can I print a std::wstring using std::wcout ? I tried the following, which was recommended here, but it works only for printing this ¡Hola! but not this 日本 : #include <iostream> #include <clocale> int main(int argc, char* argv[]) { char* locale = setlocale(LC_ALL, ""); std::cout << "locale: " << locale << std::endl; // "C" for me std::locale lollocale(locale); setlocale(LC_ALL, locale); std::wcout.imbue(lollocale); std::wcout << L"¡Hola!" << std::endl; // ok std::wcout << L"日本" << std: