utf-8 | 易学教程

Byte array in objective c with ascii encoding

阅读更多关于 Byte array in objective c with ascii encoding

问题 I am trying to get a byte array from an NSString in objective c using ascii encoding. I need to this array to calculate the SHA256 hash of that string and then compare the result to the SHA256 encoding generated in Windows. NSString *myString = @"123456¥"; const char *string = (const unsigned char *) [myString cStringUsingEncoding:NSASCIIStringEncodin]; this always gives nil since it contains the ¥ character. the problem is I cannot use UTF8Encoding since the hash generated by windows uses

Read file with UTF-8 in Haskell as IO String

阅读更多关于 Read file with UTF-8 in Haskell as IO String

问题 I have the following code which works fine unless the file has utf-8 characteres : module Main where import Ref main = do text <- getLine theInput <- readFile text writeFile ("a"++text) (unlist . proc . lines $ theInput) With utf-8 characteres I get this: hGetContents: invalid argument (invalid byte sequence) Since the file I'm working with has UTF-8 characters, I would like to handle this exception in order to reuse the functions imported from Ref if possible. Is there a way to read a UTF-8

Read file with UTF-8 in Haskell as IO String

阅读更多关于 Read file with UTF-8 in Haskell as IO String

Issue in DocumentTermMatrix with corpus in German

阅读更多关于 Issue in DocumentTermMatrix with corpus in German

问题 I created a corpus in R using package tm specifying language and encoding as follows: de_DE.corpus <- Corpus(VectorSource(de_DE.sample), readerControl = list(language="de_DE",encoding = "UTF_8")) de_DE.corpus[36]$content de_DE.dtm <- DocumentTermMatrix(de_DE.corpus,control = list (encoding = 'UTF-8')) inspect(de_DE.dtm[, grepl("grÃ", de_DE.dtm$dimnames$Terms)]) inspect(de_DE.dtm[36, ]) If I see the content in de_DE.corpus[36]$content of document 36 which has 'ü' the text is shown correctly. e

How to read.table with “Hebrew” column names (in R)?

阅读更多关于 How to read.table with “Hebrew” column names (in R)?

问题 I am trying to read a .txt file, with Hebrew column names, but without success. I uploaded an example file to: http://www.talgalili.com/files/aa.txt And am trying the command: read.table("http://www.talgalili.com/files/aa.txt", header = T, sep = "\t") This returns me with: X.....ª X...ª...... X...œ.... 1 12 97 6 2 123 354 44 3 6 1 3 Instead of: אחת שתיים שלוש 12 97 6 123 354 44 6 1 3 My output for: l10n_info() Is: $MBCS [1] FALSE $`UTF-8` [1] FALSE $`Latin-1` [1] TRUE $codepage [1] 1252 And

Jekyll encoding name of category special characters

阅读更多关于 Jekyll encoding name of category special characters

问题 My Jekyll installation used to work. Since an update, I face an issue with URL containing tag names which have some special characters. I now get an error message when trying to reach a URL with special characters in it like http://127.0.0.1:4000/tag/Actualit%C3%A9%20europ%C3%A9enne/ , where Actualité européenne is the name of a category. The error message is incompatible character encodings: UTF-8 and ASCII-8BIT . All the files in _posts directory are utf-8. Here is the stack trace : [2017

Jekyll encoding name of category special characters

阅读更多关于 Jekyll encoding name of category special characters

Jekyll encoding name of category special characters

阅读更多关于 Jekyll encoding name of category special characters

Whitespace in a database field is not removed by trim()

阅读更多关于 Whitespace in a database field is not removed by trim()

问题 I have some whitespace at the begining of a paragraph in a text field in MySQL. Using trim($var_text_field) in PHP or TRIM(text_field) in MySQL statements does absolutely nothing. What could this whitespace be and how do I remove it by code? If I go into the database and backspace it out, it saves properly. It's just not being removed via the trim() functions. 回答1: function UberTrim($s) { $s = preg_replace('/\xA0/u', ' ', $s); // strips UTF-8 NBSP: "\xC2\xA0" $s = trim($s); return $s; } The

C++ crash when use setmode with _O_U8TEXT to deal with unicode

阅读更多关于 C++ crash when use setmode with _O_U8TEXT to deal with unicode

问题 What I've tried to print unicode is _setmode(_fileno(stdout), _O_U8TEXT); string str = u8"unicode 한글 hangul"; cout << str << endl; I used setmode to show and get unicode correctly, but It crashed with Debug Assertion Fail. However, _setmode(_fileno(stdout), _O_U16TEXT); wstring str = L"unicode 한글 hangul"; wcout << str << endl; _O_U16TEXT compile and print correctly. What should I do to use UTF-8? Do I have to find another trick? 回答1: _setmode mentions _O_U8TEXT and _O_U16TEXT (finally), but