metacharacters | 易学教程

strsplit in R with a metacharacter

阅读更多关于 strsplit in R with a metacharacter

问题 I have a large amount of data where the delimiter is a backslash. I'm processing it in R and I'm having a hard time finding how to split the string since the backslash is a metacharacter. For example, a string would look like this: 1128\0019\XA5\E2R\366\00=15 and I want to split it along the \ character, but when I run the strsplit command: strsplit(tempStr, "\\") Error in strsplit(tempStr, "\\") : invalid regular expression '\', reason 'Trailing backslash' When I try to used the "fixed"

text mining with tm package in R ,remove words starting from [http] or any other specifc word

阅读更多关于 text mining with tm package in R ,remove words starting from [http] or any other specifc word

问题 I am new to R and text mining. I had made a word cloud out of twitter feed related to some term. The problem that I'm facing is that in the wordcloud it shows http:... or htt... How do I deal about this issue I tried using metacharacter * but I still doubt if I'm applying it right tw.text = removeWords(tw.text,c(stopwords("en"),"rt","http\\*")) somebody into text-minning please help me with this. 回答1: If you are looking to remove URLs from your string, you may use: gsub("(f|ht)tp(s?)://(.*)[.

What are dangling metacharacters in regex?

阅读更多关于 What are dangling metacharacters in regex?

问题 In Ruby, I wrote a simple regex to find the first { : txt.gsub! /^.*{/, '{' Whenever I run this, everything past that point for my purposes works fine, however there is a mild error that reads along the lines of WARNING: Dangling metacharacter detected. What specifically are dangling metacharacters, and how would I change my regex to be as explicit and efficient as possible? 回答1: { has special meaning in regular expression. PATTERN{m,n} Above matches PATTERN repeated m~n times. If you want

Java Regex with “Joker” characters

阅读更多关于 Java Regex with “Joker” characters

问题 I try to have a regex validating an input field. What i call "joker" chars are '?' and '*'. Here is my java regex : "^$|[^\\*\\s]{2,}|[^\\*\\s]{2,}[\\*\\?]|[^\\*\\s]{2,}[\\?]{1,}[^\\s\\*]*[\\*]{0,1}" What I'm tying to match is : Minimum 2 alpha-numeric characters (other than '?' and '*') The '*' can only appears one time and at the end of the string The '?' can appears multiple time No WhiteSpace at all So for example : abcd = OK ?bcd = OK ab?? = OK ab*= OK ab?* = OK ??cd = OK *ab = NOT OK ??

Replace tabs (“\t”) in flat file with “Unit Separator” (0x1f) in C#

阅读更多关于 Replace tabs (“\t”) in flat file with “Unit Separator” (0x1f) in C#

问题 I have been having trouble finding the metacharacter for the 'Unit Separator' to replace the tabs in a flat file. So far I have this: File.WriteAllLines(outputFile, File.ReadLines(inputFile) .Select(t => t.Replace("\t", "\0x1f"))); //this does not work I have also tried: File.WriteAllLines(outputFile, File.ReadLines(inputFile) .Select(t => t.Replace("\t", "\u"))); //also doesn't work AND File.WriteAllLines(outputFile, File.ReadLines(inputFile) .Select(t => t.Replace("\t", 0x1f))); //also

Escape all metacharacters in Python

阅读更多关于 Escape all metacharacters in Python

问题 I need to search for patterns which may have many metacharacters. Currently I use a long regex. prodObjMatcher=re.compile(r"""^(?P<nodeName>[\w\/\:\[\]\<\>\@\$]+)""", re.S|re.M|re.I|re.X) (my actual pattern is very long so I just pasted some relevant portion on which I need help) This is especially painful when I need to write combinations of such patterns in a single re compilation. Is there a pythonic way for shortening the pattern length? 回答1: Look, your pattern can be reduced to r"""^(?P

text mining with tm package in R ,remove words starting from [http] or any other specifc word

阅读更多关于 text mining with tm package in R ,remove words starting from [http] or any other specifc word

I am new to R and text mining. I had made a word cloud out of twitter feed related to some term. The problem that I'm facing is that in the wordcloud it shows http:... or htt... How do I deal about this issue I tried using metacharacter * but I still doubt if I'm applying it right tw.text = removeWords(tw.text,c(stopwords("en"),"rt","http\\*")) somebody into text-minning please help me with this. If you are looking to remove URLs from your string, you may use: gsub("(f|ht)tp(s?)://(.*)[.][a-z]+", "", x) Where x would be: x <- c("some text http://idontwantthis.com", "same problem again http:/

How to match string that contain exact 3 time occurrence of special character in perl

阅读更多关于 How to match string that contain exact 3 time occurrence of special character in perl

问题 I have try few method to match a word that contain exact 3 times slash but cannot work. Below are the example @array = qw( abc/ab1/abc/abc a2/b1/c3/d4/ee w/5/a s/t ) foreach my $string (@array){ if ( $string =~ /^\/{3}/ ){ print " yes, word with 3 / found !\n"; print "$string\n"; } else { print " no word contain 3 / found\n"; } Few macthing i try but none of them work $string =~ /^\/{3}/; $string =~ /^(\w+\/\w+\/\w+\/\w+)/; $string =~ /^(.*\/.*\/.*\/.*)/; Any other way i can match this type

How to match string that contain exact 3 time occurrence of special character in perl

阅读更多关于 How to match string that contain exact 3 time occurrence of special character in perl

I have try few method to match a word that contain exact 3 times slash but cannot work. Below are the example @array = qw( abc/ab1/abc/abc a2/b1/c3/d4/ee w/5/a s/t ) foreach my $string (@array){ if ( $string =~ /^\/{3}/ ){ print " yes, word with 3 / found !\n"; print "$string\n"; } else { print " no word contain 3 / found\n"; } Few macthing i try but none of them work $string =~ /^\/{3}/; $string =~ /^(\w+\/\w+\/\w+\/\w+)/; $string =~ /^(.*\/.*\/.*\/.*)/; Any other way i can match this type of string and print the string? Match globally and compare the number of matches with 3 if ( ( () = m{/

Extended regular expressions (ERE) for .gitignore

阅读更多关于 Extended regular expressions (ERE) for .gitignore

问题 Is there a way to use extended regular expressions(ERE) in a .gitignore file? For example I want to use the + repetition character in a .gitignore file. Is there a way to do that? 回答1: As illustrated here and detailed in "this question", the function fnmatch() is involved to interpret glob patterns, which means regular expressions are not supported. This is what gitignore man page mentions: Otherwise, git treats the pattern as a shell glob suitable for consumption by fnmatch(3) with the FNM