non-ascii-characters | 易学教程

Accented characters in mySQL table

阅读更多关于 Accented characters in mySQL table

I have some texts in French (containing accented characters such as "é"), stored in a MySQL table whose collation is utf8_unicode_ci (both the table and the columns), that I want to output on an HTML5 page. The HTML page charset is UTF-8 (< meta charset="utf-8" />) and the PHP files themselves are encoded as "UTF-8 without BOM" (I use Notepad++ on Windows). I use PHP5 to request the database and generate the HTML. However, on the output page, the special characters (such as "é") appear garbled and are replaced by "�". When I browse the database (via phpMyAdmin) those same accented characters

“UnicodeEncodeError: 'ascii' codec can't encode character”

阅读更多关于 “UnicodeEncodeError: 'ascii' codec can't encode character”

I'm trying to pass big strings of random html through regular expressions and my Python 2.6 script is choking on this: UnicodeEncodeError: 'ascii' codec can't encode character I traced it back to a trademark superscript on the end of this word: Protection™ -- and I expect to encounter others like it in the future. Is there a module to process non-ascii characters? or, what is the best way to handle/escape non-ascii stuff in python? Thanks! Full error: E ====================================================================== ERROR: test_untitled (__main__.Untitled) ------------------------------

Why is this symbol showing up on Chrome and not Firefox or Edge?

阅读更多关于 Why is this symbol showing up on Chrome and not Firefox or Edge?

问题 So this web page is rendering with these symbols and they are found throughout this website/application but on no other sites. Can anyone tell me What the symbol is Why it is showing up only in one browser ? 回答1: That character is U+2028 Line Separator, which is a kind of newline character. Think of it as the Unicode equivalent of HTML’s <br> . As to why it shows up here: my guess would be that an internal database uses LSEP to not conflict with literal newlines or HTML tags (which might

Remove non-ASCII non-printable characters from a String

阅读更多关于 Remove non-ASCII non-printable characters from a String

I get user input including non-ASCII characters and non-printable characters, such as \xc2d \xa0 \xe7 \xc3\ufffdd \xc3\ufffdd \xc2\xa0 \xc3\xa7 \xa0\xa0 for example: email : abc@gmail.com\xa0\xa0 street : 123 Main St.\xc2\xa0 desired output: email : abc@gmail.com street : 123 Main St. What is the best way to removing them using Java? I tried the following, but doesn't seem to work public static void main(String args[]) throws UnsupportedEncodingException { String s = "abc@gmail\\xe9.com"; String email = "abc@gmail.com\\xa0\\xa0"; System.out.println(s.replaceAll("\\P{Print}", "")); System.out

R on Windows: character encoding hell

阅读更多关于 R on Windows: character encoding hell

I am trying to import a CSV encoded as OEM-866 (Cyrillic charset) into R on Windows. I also have a copy that has been converted into UTF-8 w/o BOM. Both of these files are readable by all other applications on my system, once the encoding is specified. Furthermore, on Linux, R can read these particular files with the specified encodings just fine. I can also read the CSV on Windows IF I do not specify the "fileEncoding" parameter, but this results in unreadable text. When I specify the file encoding on Windows, I always get the following errors, for both the OEM and the Unicode file: Original

Remove non-ascii character in string

阅读更多关于 Remove non-ascii character in string

问题 var str="INFO] :谷��新道, ひば��ヶ丘２丁��, ひばりヶ��, 東久留米市 (Higashikurume)"; and i need to remove all non-ascii character from string, means str only contain "INFO] (Higashikurume)"; 回答1: ASCII is in range of 0 to 127, so: str.replace(/[^\x00-\x7F]/g, ""); 回答2: It can also be done with a positive assertion of removal, like this: textContent = textContent.replace(/[\u{0080}-\u{FFFF}]/gu,""); This uses unicode. In Javascript, when expressing unicode for a regular expression, the characters are

How to ignore acute accent in a javascript regex match?

阅读更多关于 How to ignore acute accent in a javascript regex match?

I need to match a word like 'César' for a regex like this /^cesar/i . Is there an option like /i to configure the regex so it ignores the acute accents?. Or the only solution is to use a regex like this /^césar/i . The standard ecmascript regex isn't ready for unicode (see http://blog.stevenlevithan.com/archives/javascript-regex-and-unicode ). So you have to use an external regex library. I used this one (with the unicode plugin) in the past : http://xregexp.com/ In your case, you may have to escape the char é as \u00E9 and defining a range englobing e, é, ê, etc. EDIT : I just saw the comment

Remove Unicode characters in a String

阅读更多关于 Remove Unicode characters in a String

问题 How do I remove all special characters which don't fall under ASCII category in VBA? These are some of the symbols which appear in my string. Œ œ Š š Ÿ ƒ There are many more such characters. These don't belong to ASCII category as you can see here http://www.ascii.cl/htmlcodes.htm I tried something like this strName = Replace(strName, ChrW(376), " ") 回答1: Would a RegEx solution be of interest to you? There are plenty of examples for different languages on this site - here's a C# one: How can

How to printf accented characters in ANSI C (like á é í ó ú)

阅读更多关于 How to printf accented characters in ANSI C (like á é í ó ú)

问题 I tried to printf with some accented characters such as á é í ó ú : printf("my name is Seán\n"); The text editor in the DEVC++ IDE displays them fine - i.e the source code looks fine. I guess I need some library other than stdio.h and maybe some variant of the normal printf . I'm using IDE Bloodshed DEVC running on Windows XP. 回答1: Windows console is generally considered badly broken regarding to character encodings. You can read about this problem here, for example. The problem is that

Unicode support in Web standard fonts

阅读更多关于 Unicode support in Web standard fonts

问题 I need to decide whether to render geometric symbols in a web GUI (e.g. arrows and triangles for buttons, menus, etc.) as Unicode symbols (MUCH easier and color-independent) or GIF/PNG files (lots of hassle I would like to avoid). However, I have seen clients that have trouble displaying even advanced punctuation symbols declared as unicode characters (Example). Does anybody know from which version on, OSs / Service Packs / Applications ship with Unicode versions of the standard fonts? There