unicode-string

How do I isolate a space using RegExp in VBA (\\s vs. \\p{Zs})?

馋奶兔 提交于 2019-11-28 02:11:11
Introduction/Question: I have been studying the use of Regular Expressions (using VBA/Excel), and so far I cannot understand how I would isolate a <space> (or " " ) using regexp from other white space characters that are included in \s . I thought that I would be able to use \p{Zs} , but in my testing so far, it has not worked out. Could someone please correct my misunderstanding? I appreciate any helpful input. To offer proper credit , I modified some code that started off as a very helpful post by @Portland Runner that is found here: How to use Regular Expressions (Regex) in Microsoft Excel

Convert Unicode character to NSString

久未见 提交于 2019-11-28 01:23:08
I have received string from webservice which contains Unicode character. I want to convert that To plain NSString. so How can i do that? ex: "This isn\u0092t your bike" So how can remove unicode and replace it with its equal special symbol characted. The output would be : "This isn't your bike" char cString[] = "This isn\u2019t your bike"; NSData *data = [NSData dataWithBytes:cString length:strlen(cString)]; NSString *string = [[NSString alloc] initWithData:data encoding:NSUTF8StringEncoding]; NSLog(@"result string: %@", string); This should work. UPDATE FOR THE COMMENT: The unicode character

Regex for a (twitter-like) hashtag that allows non-ASCII characters

天涯浪子 提交于 2019-11-27 15:06:53
I want a regex to match a simple hashtag like that in twitter (e.g. #someword). I want it also to recognize non standard characters (like those in Spanish, Hebrew or Chinese). This was my initial regex: (^|\s|\b)(#(\w+))\b --> but it doesn't recognize non standard characters. Then, I tried using XRegExp.js , which worked, but ran too slowly. Any suggestions for how to do it? limlim Eventually I found this: twitter-text.js useful link, which is basically how twitter solve this problem. With native JS regexes that don't support unicode, your only option is to explicitly enumerate characters that

Why is the length of this string longer than the number of characters in it?

为君一笑 提交于 2019-11-27 10:53:55
This code: string a = "abc"; string b = "A𠈓C"; Console.WriteLine("Length a = {0}", a.Length); Console.WriteLine("Length b = {0}", b.Length); outputs: Length a = 3 Length b = 4 Why? The only thing I could imagine is that the Chinese character is 2 bytes long and that the .Length method returns the byte count. Everyone else is giving the surface answer, but there's a deeper rationale too: the number of "characters" is a difficult-to-define question and can be surprisingly expensive to compute, whereas a length property should be fast. Why is it difficult to define? Well, there's a few options

Java Unicode String length

微笑、不失礼 提交于 2019-11-27 10:25:33
问题 I am trying hard to get the count of unicode string and tried various options. Looks like a small problem but struck in a big way. Here I am trying to get the length of the string str1. I am getting it as 6. But actually it is 3. moving the cursor over the string "குமார்" also shows it as 3 chars. Basically I want to measure the length and print each character. like "கு", "மா", "ர்" . public class one { public static void main(String[] args) { String str1 = new String("குமார்"); System.out

What is the range of Unicode Printable Characters?

懵懂的女人 提交于 2019-11-27 07:57:34
Can anybody please tell me what is the range of Unicode printable characters? [e.g. Ascii printable character range is \u0020 - \u007f] See, http://en.wikipedia.org/wiki/Unicode_control_characters You might want to look especially at C0 and C1 control character http://en.wikipedia.org/wiki/C0_and_C1_control_codes The wiki says, the C0 control character is in the range U+0000—U+001F and U+007F (which is the same range as ASCII) and C1 control character is in the range U+0080—U+009F other than C-control character, Unicode also has hundreds of formatting control characters, e.g. zero-width non

PDO and UTF-8 special characters in PHP / MySQL?

这一生的挚爱 提交于 2019-11-27 07:45:29
问题 I am using MySQL and PHP 5.3 and tried this code. $dbhost = 'localhost'; $dbuser = 'root'; $dbpass = ''; $con = mysql_connect("localhost", "root", ""); mysql_set_charset('utf8'); if (!$con) { die('Could not connect: ' . mysql_error()); } mysql_select_db("kdict", $con); $sql = "SELECT * FROM `en-kh` where english='a'"; echo $sql; $result = mysql_query($sql); while($row = mysql_fetch_array($result)) { echo $row['english'] . " </br> " . $row['khmer']; echo "<br />"; } ?> => I got good UTF-8

Java: How to create unicode from string “\u00C3” etc

混江龙づ霸主 提交于 2019-11-27 06:54:49
问题 I have a file that has strings hand typed as \u00C3. I want to create a unicode character that is being represented by that unicode in java. I tried but could not find how. Help. Edit: When I read the text file String will contain "\u00C3" not as unicode but as ASCII chars '\' 'u' '0' '0' '3'. I would like to form unicode character from that ASCII string. 回答1: I picked this up somewhere on the web: String unescape(String s) { int i=0, len=s.length(); char c; StringBuffer sb = new StringBuffer

Converting a \\u escaped Unicode string to ASCII

北战南征 提交于 2019-11-27 05:06:32
After reading all about iconv and Encoding , I am still confused. I am scraping the source of a web page I have a string that looks like this: 'pretty\u003D\u003Ebig' (displayed in the R console as 'pretty\\\u003D\\\u003Ebig' ). I want to convert this to the ASCII string, which should be 'pretty=>big' . More simply, if I set x <- 'pretty\\u003D\\u003Ebig' How do I perform a conversion on x to yield pretty=>big ? Any suggestions? Use parse, but don't evaluate the results: x1 <- 'pretty\\u003D\\u003Ebig' x2 <- parse(text = paste0("'", x1, "'")) x3 <- x2[[1]] x3 # [1] "pretty=>big" is.character

How do I isolate a space using RegExp in VBA (\s vs. \p{Zs})?

帅比萌擦擦* 提交于 2019-11-27 04:52:00
问题 Introduction/Question: I have been studying the use of Regular Expressions (using VBA/Excel), and so far I cannot understand how I would isolate a <space> (or " " ) using regexp from other white space characters that are included in \s . I thought that I would be able to use \p{Zs} , but in my testing so far, it has not worked out. Could someone please correct my misunderstanding? I appreciate any helpful input. To offer proper credit , I modified some code that started off as a very helpful