utf-8

Byte array in objective c with ascii encoding

安稳与你 提交于 2021-02-19 08:37:52
问题 I am trying to get a byte array from an NSString in objective c using ascii encoding. I need to this array to calculate the SHA256 hash of that string and then compare the result to the SHA256 encoding generated in Windows. NSString *myString = @"123456¥"; const char *string = (const unsigned char *) [myString cStringUsingEncoding:NSASCIIStringEncodin]; this always gives nil since it contains the ¥ character. the problem is I cannot use UTF8Encoding since the hash generated by windows uses

Read file with UTF-8 in Haskell as IO String

时间秒杀一切 提交于 2021-02-18 21:13:32
问题 I have the following code which works fine unless the file has utf-8 characteres : module Main where import Ref main = do text <- getLine theInput <- readFile text writeFile ("a"++text) (unlist . proc . lines $ theInput) With utf-8 characteres I get this: hGetContents: invalid argument (invalid byte sequence) Since the file I'm working with has UTF-8 characters, I would like to handle this exception in order to reuse the functions imported from Ref if possible. Is there a way to read a UTF-8

Read file with UTF-8 in Haskell as IO String

℡╲_俬逩灬. 提交于 2021-02-18 21:13:11
问题 I have the following code which works fine unless the file has utf-8 characteres : module Main where import Ref main = do text <- getLine theInput <- readFile text writeFile ("a"++text) (unlist . proc . lines $ theInput) With utf-8 characteres I get this: hGetContents: invalid argument (invalid byte sequence) Since the file I'm working with has UTF-8 characters, I would like to handle this exception in order to reuse the functions imported from Ref if possible. Is there a way to read a UTF-8

Issue in DocumentTermMatrix with corpus in German

余生颓废 提交于 2021-02-18 18:53:42
问题 I created a corpus in R using package tm specifying language and encoding as follows: de_DE.corpus <- Corpus(VectorSource(de_DE.sample), readerControl = list(language="de_DE",encoding = "UTF_8")) de_DE.corpus[36]$content de_DE.dtm <- DocumentTermMatrix(de_DE.corpus,control = list (encoding = 'UTF-8')) inspect(de_DE.dtm[, grepl("grÃ", de_DE.dtm$dimnames$Terms)]) inspect(de_DE.dtm[36, ]) If I see the content in de_DE.corpus[36]$content of document 36 which has 'ü' the text is shown correctly. e

How to read.table with “Hebrew” column names (in R)?

会有一股神秘感。 提交于 2021-02-18 17:16:54
问题 I am trying to read a .txt file, with Hebrew column names, but without success. I uploaded an example file to: http://www.talgalili.com/files/aa.txt And am trying the command: read.table("http://www.talgalili.com/files/aa.txt", header = T, sep = "\t") This returns me with: X.....ª X...ª...... X...œ.... 1 12 97 6 2 123 354 44 3 6 1 3 Instead of: אחת שתיים שלוש 12 97 6 123 354 44 6 1 3 My output for: l10n_info() Is: $MBCS [1] FALSE $`UTF-8` [1] FALSE $`Latin-1` [1] TRUE $codepage [1] 1252 And

Jekyll encoding name of category special characters

Deadly 提交于 2021-02-18 17:09:37
问题 My Jekyll installation used to work. Since an update, I face an issue with URL containing tag names which have some special characters. I now get an error message when trying to reach a URL with special characters in it like http://127.0.0.1:4000/tag/Actualit%C3%A9%20europ%C3%A9enne/ , where Actualité européenne is the name of a category. The error message is incompatible character encodings: UTF-8 and ASCII-8BIT . All the files in _posts directory are utf-8. Here is the stack trace : [2017

Jekyll encoding name of category special characters

允我心安 提交于 2021-02-18 17:09:33
问题 My Jekyll installation used to work. Since an update, I face an issue with URL containing tag names which have some special characters. I now get an error message when trying to reach a URL with special characters in it like http://127.0.0.1:4000/tag/Actualit%C3%A9%20europ%C3%A9enne/ , where Actualité européenne is the name of a category. The error message is incompatible character encodings: UTF-8 and ASCII-8BIT . All the files in _posts directory are utf-8. Here is the stack trace : [2017

Jekyll encoding name of category special characters

≡放荡痞女 提交于 2021-02-18 17:09:28
问题 My Jekyll installation used to work. Since an update, I face an issue with URL containing tag names which have some special characters. I now get an error message when trying to reach a URL with special characters in it like http://127.0.0.1:4000/tag/Actualit%C3%A9%20europ%C3%A9enne/ , where Actualité européenne is the name of a category. The error message is incompatible character encodings: UTF-8 and ASCII-8BIT . All the files in _posts directory are utf-8. Here is the stack trace : [2017

Whitespace in a database field is not removed by trim()

点点圈 提交于 2021-02-18 16:58:06
问题 I have some whitespace at the begining of a paragraph in a text field in MySQL. Using trim($var_text_field) in PHP or TRIM(text_field) in MySQL statements does absolutely nothing. What could this whitespace be and how do I remove it by code? If I go into the database and backspace it out, it saves properly. It's just not being removed via the trim() functions. 回答1: function UberTrim($s) { $s = preg_replace('/\xA0/u', ' ', $s); // strips UTF-8 NBSP: "\xC2\xA0" $s = trim($s); return $s; } The

C++ crash when use setmode with _O_U8TEXT to deal with unicode

你离开我真会死。 提交于 2021-02-18 15:38:14
问题 What I've tried to print unicode is _setmode(_fileno(stdout), _O_U8TEXT); string str = u8"unicode 한글 hangul"; cout << str << endl; I used setmode to show and get unicode correctly, but It crashed with Debug Assertion Fail. However, _setmode(_fileno(stdout), _O_U16TEXT); wstring str = L"unicode 한글 hangul"; wcout << str << endl; _O_U16TEXT compile and print correctly. What should I do to use UTF-8? Do I have to find another trick? 回答1: _setmode mentions _O_U8TEXT and _O_U16TEXT (finally), but