unicode-string | 易学教程

How do I isolate a space using RegExp in VBA (\\s vs. \\p{Zs})?

阅读更多关于 How do I isolate a space using RegExp in VBA (\\s vs. \\p{Zs})?

Introduction/Question: I have been studying the use of Regular Expressions (using VBA/Excel), and so far I cannot understand how I would isolate a <space> (or " " ) using regexp from other white space characters that are included in \s . I thought that I would be able to use \p{Zs} , but in my testing so far, it has not worked out. Could someone please correct my misunderstanding? I appreciate any helpful input. To offer proper credit , I modified some code that started off as a very helpful post by @Portland Runner that is found here: How to use Regular Expressions (Regex) in Microsoft Excel

Convert Unicode character to NSString

阅读更多关于 Convert Unicode character to NSString

I have received string from webservice which contains Unicode character. I want to convert that To plain NSString. so How can i do that? ex: "This isn\u0092t your bike" So how can remove unicode and replace it with its equal special symbol characted. The output would be : "This isn't your bike" char cString[] = "This isn\u2019t your bike"; NSData *data = [NSData dataWithBytes:cString length:strlen(cString)]; NSString *string = [[NSString alloc] initWithData:data encoding:NSUTF8StringEncoding]; NSLog(@"result string: %@", string); This should work. UPDATE FOR THE COMMENT: The unicode character

Regex for a (twitter-like) hashtag that allows non-ASCII characters

阅读更多关于 Regex for a (twitter-like) hashtag that allows non-ASCII characters

I want a regex to match a simple hashtag like that in twitter (e.g. #someword). I want it also to recognize non standard characters (like those in Spanish, Hebrew or Chinese). This was my initial regex: (^|\s|\b)(#(\w+))\b --> but it doesn't recognize non standard characters. Then, I tried using XRegExp.js , which worked, but ran too slowly. Any suggestions for how to do it? limlim Eventually I found this: twitter-text.js useful link, which is basically how twitter solve this problem. With native JS regexes that don't support unicode, your only option is to explicitly enumerate characters that

Why is the length of this string longer than the number of characters in it?

阅读更多关于 Why is the length of this string longer than the number of characters in it?

This code: string a = "abc"; string b = "A𠈓C"; Console.WriteLine("Length a = {0}", a.Length); Console.WriteLine("Length b = {0}", b.Length); outputs: Length a = 3 Length b = 4 Why? The only thing I could imagine is that the Chinese character is 2 bytes long and that the .Length method returns the byte count. Everyone else is giving the surface answer, but there's a deeper rationale too: the number of "characters" is a difficult-to-define question and can be surprisingly expensive to compute, whereas a length property should be fast. Why is it difficult to define? Well, there's a few options

Java Unicode String length

阅读更多关于 Java Unicode String length

问题 I am trying hard to get the count of unicode string and tried various options. Looks like a small problem but struck in a big way. Here I am trying to get the length of the string str1. I am getting it as 6. But actually it is 3. moving the cursor over the string "குமார்" also shows it as 3 chars. Basically I want to measure the length and print each character. like "கு", "மா", "ர்" . public class one { public static void main(String[] args) { String str1 = new String("குமார்"); System.out

What is the range of Unicode Printable Characters?

阅读更多关于 What is the range of Unicode Printable Characters?

Can anybody please tell me what is the range of Unicode printable characters? [e.g. Ascii printable character range is \u0020 - \u007f] See, http://en.wikipedia.org/wiki/Unicode_control_characters You might want to look especially at C0 and C1 control character http://en.wikipedia.org/wiki/C0_and_C1_control_codes The wiki says, the C0 control character is in the range U+0000—U+001F and U+007F (which is the same range as ASCII) and C1 control character is in the range U+0080—U+009F other than C-control character, Unicode also has hundreds of formatting control characters, e.g. zero-width non

PDO and UTF-8 special characters in PHP / MySQL?

阅读更多关于 PDO and UTF-8 special characters in PHP / MySQL?

问题 I am using MySQL and PHP 5.3 and tried this code. $dbhost = 'localhost'; $dbuser = 'root'; $dbpass = ''; $con = mysql_connect("localhost", "root", ""); mysql_set_charset('utf8'); if (!$con) { die('Could not connect: ' . mysql_error()); } mysql_select_db("kdict", $con); $sql = "SELECT * FROM `en-kh` where english='a'"; echo $sql; $result = mysql_query($sql); while($row = mysql_fetch_array($result)) { echo $row['english'] . " </br> " . $row['khmer']; echo "<br />"; } ?> => I got good UTF-8

Java: How to create unicode from string “\u00C3” etc

阅读更多关于 Java: How to create unicode from string “\u00C3” etc

问题 I have a file that has strings hand typed as \u00C3. I want to create a unicode character that is being represented by that unicode in java. I tried but could not find how. Help. Edit: When I read the text file String will contain "\u00C3" not as unicode but as ASCII chars '\' 'u' '0' '0' '3'. I would like to form unicode character from that ASCII string. 回答1: I picked this up somewhere on the web: String unescape(String s) { int i=0, len=s.length(); char c; StringBuffer sb = new StringBuffer

Converting a \\u escaped Unicode string to ASCII

阅读更多关于 Converting a \\u escaped Unicode string to ASCII

After reading all about iconv and Encoding , I am still confused. I am scraping the source of a web page I have a string that looks like this: 'pretty\u003D\u003Ebig' (displayed in the R console as 'pretty\\\u003D\\\u003Ebig' ). I want to convert this to the ASCII string, which should be 'pretty=>big' . More simply, if I set x <- 'pretty\\u003D\\u003Ebig' How do I perform a conversion on x to yield pretty=>big ? Any suggestions? Use parse, but don't evaluate the results: x1 <- 'pretty\\u003D\\u003Ebig' x2 <- parse(text = paste0("'", x1, "'")) x3 <- x2[[1]] x3 # [1] "pretty=>big" is.character

How do I isolate a space using RegExp in VBA (\s vs. \p{Zs})?

阅读更多关于 How do I isolate a space using RegExp in VBA (\s vs. \p{Zs})?

问题 Introduction/Question: I have been studying the use of Regular Expressions (using VBA/Excel), and so far I cannot understand how I would isolate a <space> (or " " ) using regexp from other white space characters that are included in \s . I thought that I would be able to use \p{Zs} , but in my testing so far, it has not worked out. Could someone please correct my misunderstanding? I appreciate any helpful input. To offer proper credit , I modified some code that started off as a very helpful