unicode | 易学教程

Why is /[\w-+]/ a valid regex but /[\w-+]/u invalid?

阅读更多关于 Why is /[\w-+]/ a valid regex but /[\w-+]/u invalid?

问题 If I type /[\w-+]/ in the Chrome console, it accepts it. I get a regex object I can use to test strings as usual. But if I type /[\w-+]/u , it says VM112:1 Uncaught SyntaxError: Invalid regular expression: /[\w-+]/: Invalid character class . In Firefox, /[\w-+]/ works fine, but if I type /[\w-+]/u in the console, it just goes to the next line as if I typed an incomplete statement. If I try to force it to create the regex by running eval('/[\w-+]/u') , it tells me SyntaxError: invalid range in

Copying emojis in text from MySQL to SQL Server

阅读更多关于 Copying emojis in text from MySQL to SQL Server

问题 I am copying data from MySQL to SQL Server using a linked server. SELECT comment FROM openquery(my_linked_server, 'SELECT comment FROM search_data'); The text in the MySQL table column is xxx 🤘 xxx . By time I receive it in SQL Server it is xxx ðŸ¤˜ xxx . The MySQL table is utf8mb4 , and I have set up the ODBC config for the linked server to use this. I am using MySQL ODBC 5.3.13 Any advice would be appreciated. the SQL Server version is 2016, I have seen examples to put do select N'🤘' etc,

What is an example for non unicode character set for -Dfile.encoding=?

阅读更多关于 What is an example for non unicode character set for -Dfile.encoding=?

问题 I have a JVM. where character set as "-Dfile.encoding=UTF-8" . This is how UTF-8 is set. I would want to set it to a non Unicode character set. Is there an example/value for non unicode character set so that I can set to -Dfile.encoding= ? 回答1: [ TLDR => Application encoding a confusing issue, but this document from Oracle should help . ] First a few important general points about specifying the encoding by setting the System Property file.encoding at run time: It's use is not formally

Retaining special character while reading from html java?

阅读更多关于 Retaining special character while reading from html java?

问题 i am trying to read html source file which contains German characters like ä ö ü ß € Reading using JSOUP citAttr.nextElementSibling().text() Encoding the string with unicodeEscaper.translate(citAttr.nextElementSibling().text()) org.apache.commons.lang3.text.translate.UnicodeEscaper Issue is after reading the charecters turns into � But where as reading CSV with Encoded type UTF-8 with above unicodeEscaper saving & retriving the charecters works fine. unicodeEscaper.translate(record.get

How to convert String index to character index in Dart

阅读更多关于 How to convert String index to character index in Dart

问题 If I have an arbitrary String like this: final family = '\u{1F468}\u{200D}\u{1F469}\u{200D}\u{1F467}'; // 👨‍👩‍👧 final myString = 'Let me introduce my $family to you.'; And I know the String index of the character after the family emoji (the space) is 28 , how do I find the String index of the first code unit of the family emoji? In other words, how to I find the length in UTF-16 code units of the family emoji? I've asked a similar question before, but that was before the characters package

How to convert String index to character index in Dart

阅读更多关于 How to convert String index to character index in Dart

Windows console app stops printing when I switch to Edge

阅读更多关于 Windows console app stops printing when I switch to Edge

问题 I have to write a console app to log the active window PID, text length and text. It works except when I switch to Edge. The execution doesn't stop, but only the PID and text length get printed to the screen. Please help, I don't know what else to try. #include <iostream> #include <Windows.h> #include <WinUser.h> int main() { // Use environment's default locale for char type setlocale(LC_CTYPE, ""); std::cout << "Hello é World!\n"; while (1) { // Get foreground window HWND hwnd =

Matching Unicode letter characters in PCRE/PHP

阅读更多关于 Matching Unicode letter characters in PCRE/PHP

问题 I'm trying to write a reasonably permissive validator for names in PHP, and my first attempt consists of the following pattern: // unicode letters, apostrophe, hyphen, space $namePattern = "/^([\\p{L}'\\- ])+$/"; This is eventually passed to a call to preg_match() . As far as I can tell, this works with your vanilla ASCII alphabet, but seems to trip up on spicier characters like Ă or 张. Is there something wrong with the pattern itself? Perhaps I'm expecting \p{L} to do more work than I think

output utf8 in console with Visual Studio (wide stream)

阅读更多关于 output utf8 in console with Visual Studio (wide stream)

问题 This piece of code works if i compiled it with mingw32 on windows 10. and emits right result, as you can see below : C:\prj\cd>bin\main.exe 1°à€3§4ç5@の,は,でした,象形字 ; Indeed when i try to compile it with Visual Studio 17, same code emits wrong chracters /out:prova.exe prova.obj C:\prj\cd>prova.exe 1Â°Ã â‚¬3Â§4Ã§5@ã®,ã¯,ã§ã—ãŸ,è±¡å½¢å— ; C:\prj\cd> here source code : #include <windows.h> #include <io.h> #include <fcntl.h> #include <stdio.h> #include <string> #include <iostream> int main ( void )

Unicode String in urllib.request [duplicate]

阅读更多关于 Unicode String in urllib.request [duplicate]

问题 This question already has answers here : UnicodeEncodeError: 'ascii' codec can't encode character '\xe9' - -when using urlib.request python3 (2 answers) Closed 1 year ago . The short version: I have a variable s = 'bär' . I need to convert s to ASCII so that s = 'b%C3%A4r' . Long version: I'm using urllib.request.urlopen() to read an mp3 pronunciation file from URL. This has worked very well, except I ran into a problem because the URLs often contain unicode characters. For example, the