unicode

Is it possible to “sniff” the Character encoding?

我的梦境 提交于 2021-02-08 14:53:33
问题 I have a webpage that accepts CSV files. These files may be created in a variety of places. (I think) there is no way to specify the encoding in a CSV file - so I can not reliably treat all of them as utf-8 or any other encoding. Is there a way to intelligently guess the encoding of the CSV I am getting? I am working with Python, but willing to work with language agnostic methods too. 回答1: There is no correct way to determine the encoding of a file by looking at only the file itself, but you

System.Uri drops Unicode RLM (Right-to-Left Mark; U+200F) character in .NET 4.5+

為{幸葍}努か 提交于 2021-02-08 13:47:35
问题 using System; namespace UnicodeRlm { class Program { static void Main(string[] args) { var uri = new Uri( "https://example.com/attachments/The title is \"مفتاح معايير الويب!‏\" in Arabic.pdf"); Console.WriteLine(uri.AbsolutePath); Console.WriteLine(uri.AbsolutePath.Length); } } } Under .NET 4.0, this produces /attachments/The%20title%20is%20%22%D9%85%D9%81%D8%AA%D8%A7%D8%AD%20%D9%85%D8%B9%D8%A7%D9%8A%D9%8A%D8%B1%20%D8%A7%D9%84%D9%88%D9%8A%D8%A8!%E2%80%8F%22%20in%20Arabic.pdf 168 Under .NET 4

I want to display Greek unicode characters but i get “?” instead on ouput

孤街浪徒 提交于 2021-02-08 10:28:01
问题 I want to print n Greek characters in Java starting from alpha (whitch has the "\u03B1" code).This is what I had in mind: String T = ""; for(int i = 0,aux = 0;i<n;i++) { aux = '\u03B1' + i; T +=Character.toString((char)aux); } System.out.println(T); But it prints n question marks instead. Let's say n=3,on the output i get "???". I thought that maybe my method is wrong but then again if I try something like this: System.out.println("\u03B1\u03B2\u03B3"); I get the same output:"???" Why do I

Convert unicode small capitals to their ASCII equivalents

大憨熊 提交于 2021-02-08 10:22:56
问题 I have the following dataset 'Fʀɪᴇɴᴅ', 'ᴍᴏᴍ', 'ᴍᴀᴋᴇs', 'ʜᴏᴜʀʟʏ', 'ᴛʜᴇ', 'ᴄᴏᴍᴘᴜᴛᴇʀ', 'ʙᴇᴇɴ', 'ᴏᴜᴛ', 'ᴀ', 'ᴊᴏʙ', 'ғᴏʀ', 'ᴍᴏɴᴛʜs', 'ʙᴜᴛ', 'ʟᴀsᴛ', 'ᴍᴏɴᴛʜ', 'ʜᴇʀ', 'ᴄʜᴇᴄᴋ', 'ᴊᴜsᴛ', 'ᴡᴏʀᴋɪɴɢ', 'ғᴇᴡ', 'ʜᴏᴜʀs', 'sᴏᴜʀᴄᴇ', I want then into ASCII format using Python script for example: Fʀɪᴇɴᴅ - FRIEND ᴍᴏᴍ - MOM I have tried encoding decoding but that doesn't work i also have tried this solution. but that doesn't solve my problem. 回答1: Python doesn't provide a way to directly convert small caps

Converting unicode characters (C#) Testing

六月ゝ 毕业季﹏ 提交于 2021-02-08 10:18:39
问题 I have the following problem I am using an SDK that returns values form a database. The value i need is 4, 6, 7 but the SDK returns "\u0004","\u0006","\u0007" I was wondering if there is a way to check if it is "\u0004","\u0006","\u0007" or any way of doing this? I Have the following code (C#): Line_Type = Line_Type.Replace(@"\u000", ""); if (Line_Type == "4") { Line_Type = "4"; } else if (Line_Type == "6") { Line_Type = "6"; } else if (Line_Type == "7") { Line_Type = "7"; } I have tried

Generate a random unicode string

不问归期 提交于 2021-02-08 09:24:06
问题 In VS2010, this function below prints "stdout in error state", I'm unable to understand why. Any thoughts on what I'm doing wrong? void printUnicodeChars() { const auto beg = 0x0030; const auto end = 0x0039; wchar_t uchars[end-beg+2]; for (auto i = beg; i <= end; i++) { uchars[i-beg] = i; // I tried a static_cast<wchar_t>(i), still errors! } uchars[end+1] = L'\0'; std::wcout << uchars << std::endl; if (!std::wcout) { std::cerr << std::endl << "stdout in error state" << std::endl; } else { std

Beautiful Soup conversion of Unicode characters to HTML entities

╄→гoц情女王★ 提交于 2021-02-08 09:16:00
问题 This error occurs after loading the document into beautifulsoup The document contains entities like &ldquo; which gets converted to ΓÇ£ I want to output the html entities &ldquo; 回答1: use this refernce link from bs4 import BeautifulSoup soup = BeautifulSoup(html_doc) print(soup.prettify(formatter="html")) 来源: https://stackoverflow.com/questions/23191624/beautiful-soup-conversion-of-unicode-characters-to-html-entities

Beautiful Soup conversion of Unicode characters to HTML entities

ぐ巨炮叔叔 提交于 2021-02-08 09:15:28
问题 This error occurs after loading the document into beautifulsoup The document contains entities like &ldquo; which gets converted to ΓÇ£ I want to output the html entities &ldquo; 回答1: use this refernce link from bs4 import BeautifulSoup soup = BeautifulSoup(html_doc) print(soup.prettify(formatter="html")) 来源: https://stackoverflow.com/questions/23191624/beautiful-soup-conversion-of-unicode-characters-to-html-entities

IIS does not encode utf-8 urls?

一世执手 提交于 2021-02-08 08:50:08
问题 I'm running Joomla 2.5 on an IIS7 server. The problem is Joomla's search engine friendly urls don't work. Whatever url I enter, it goes to index.php. After a painful day of struggling with rewrite rules and IIS settings, I came to two realizations: Search engine friendly urls are only broken when the urls are unicode. In my WAMP server, on which the SEF urls work perfectly: $_SERVER['REQUEST_URI'] is "mydomain/%D9%85%D8%AD%D8%B5%D9%88%D9%84%D8%A7%D8%AA/%D9%82%D9%84%D8%A8%DB%8C-%D8%B9%D8%B1%D9

How can I get python ''.encode('unicode_escape') to return escape codes for ascii?

孤街醉人 提交于 2021-02-08 07:27:07
问题 I am trying to use the encode method of python strings to return the unicode escape codes for characters, like this: >>> print( 'ф'.encode('unicode_escape').decode('utf8') ) \u0444 This works fine with non-ascii characters, but for ascii characters, it just returns the ascii characters themselves: >>> print( 'f'.encode('unicode_escape').decode('utf8') ) f The desired output would be \u0066 . This script is for pedagogical purposes. How can I get the unicode hex codes for ALL characters? 回答1: