non-greedy | 易学教程

Non-greedy Regular Expression in Java

阅读更多关于 Non-greedy Regular Expression in Java

问题 I have next code: public static void createTokens(){ String test = "test is a word word word word big small"; Matcher mtch = Pattern.compile("test is a (\\s*.+?\\s*) word (\\s*.+?\\s*)").matcher(test); while (mtch.find()){ for (int i = 1; i <= mtch.groupCount(); i++){ System.out.println(mtch.group(i)); } } } And have next output: word w But in my opinion it must be: word word Somebody please explain me why so? 回答1: Because your patterns are non-greedy, so they matched as little text as

Python non-greedy regex to clean xml

阅读更多关于 Python non-greedy regex to clean xml

I have an 'xml file' file that has some unwanted characters in it <data> <tag>blar </tag><tagTwo> bo </tagTwo> some extra characters not enclosed that I want to remove <anothertag>bbb</anothertag> </data> I thought the following non-greedy substitution would remove the characters that were not properly encased in <sometag></sometag> re.sub("</([a-zA-Z]+)>.*?<","</\\1><",text) ^ ^ ^ ^ text is the xml txt. remember tag, | | put tag back without and reopen next tag read everything until the next '<' (non-gready) This regex seems only to find the position indicated with the [[]] in </tag>[[]]

Non greedy regex

阅读更多关于 Non greedy regex

I need to get the value inside some tags in a comment php file like this php code /* this is a comment !- <titulo>titulo3</titulo> <funcion> <descripcion>esta es la descripcion de la funcion 6</descripcion> </funcion> <funcion> <descripcion>esta es la descripcion de la funcion 7</descripcion> </funcion> <otros> <descripcion>comentario de otros 2a hoja</descripcion> </otros> -! */ some php code so as you can see the file has newlines and repetions of tags like <funcion></funcion> and i need to get every single one of the tags, so i was trying something like this: preg_match_all("/(<funcion>)(.*

Regex is behaving lazy, should be greedy

阅读更多关于 Regex is behaving lazy, should be greedy

问题 I thought that by default my Regex would exhibit the greedy behavior that I want, but it is not in the following code: Regex keywords = new Regex(@"in|int|into|internal|interface"); var targets = keywords.ToString().Split('|'); foreach (string t in targets) { Match match = keywords.Match(t); Console.WriteLine("Matched {0,-9} with {1}", t, match.Value); } Output: Matched in with in Matched int with in Matched into with in Matched internal with in Matched interface with in Now I realize that I

Regular expression in regards to question mark “lazy” mode

阅读更多关于 Regular expression in regards to question mark “lazy” mode

问题 I understand the ? mark here means "lazy". My question essentially is [0-9]{2}? vs [0-9]{2} Are they same? If so, why are we writing the former expression? Aren't lazy mode more expensive performance wise? If not, can you tell the difference? 回答1: There is not a difference between [0-9]{2} and [0-9]{2}? . The difference between greedy matching and lazy matching (the addition of a ? ) has to do with backtracking. Regular expression engines are built to match text (from left to right).

Non-greedy regex quantifier gives greedy result

阅读更多关于 Non-greedy regex quantifier gives greedy result

问题 I have a .net regex which I am testing using Windows Powershell. The output is as follows: > [System.Text.RegularExpressions.Regex]::Match("aaa aaa bbb", "aaa.*?bbb") Groups : {aaa aaa bbb} Success : True Captures : {aaa aaa bbb} Index : 0 Length : 11 Value : aaa aaa bbb My expectation was that using the ? quantifier would cause the match to be aaa bbb , as the second group of a's is sufficient to satisfy the expression. Is my understanding of non-greedy quantifiers flawed, or am I testing

How to non-greedy multiple lookbehind matches

阅读更多关于 How to non-greedy multiple lookbehind matches

问题 Source: <prefix><content1><suffix1><prefix><content2><suffix2> Engine: PCRE RegEx1: (?<=<prefix>)(.*)(?=<suffix1>) RegEx2: (?<=<prefix>)(.*)(?=<suffix2>) Result1: <content1> Result2: <content1><suffix1><prefix><content2> The desired result for RegEx2 is just <content2> but it is obviously greedy. How do I make RegEx2 non-greedy and use only the last matching lookbehind? [I hope I have translated this correctly from the NoteTab syntax. I don't do much RegEx coding. The <prefix>, <content> &

std::regex_match and lazy quantifier with strange behavior

阅读更多关于 std::regex_match and lazy quantifier with strange behavior

I know that: Lazy quantifier matches: As Few As Possible (shortest match) Also know that the constructor: basic_regex( ..., flag_type f = std::regex_constants::ECMAScript ); And: ECMAScript supports non-greedy matches, and the ECMAScript regex "<tag[^>]*>.*?</tag>" would match only until the first closing tag ... en.cppreference And: At most one grammar option must be chosen out of ECMAScript , basic , extended , awk , grep , egrep . If no grammar is chosen, ECMAScript is assumed to be selected ... en.cppreference And: Note that regex_match will only successfully match a regular expression to

Regular expression in regards to question mark “lazy” mode

阅读更多关于 Regular expression in regards to question mark “lazy” mode

I understand the ? mark here means "lazy". My question essentially is [0-9]{2}? vs [0-9]{2} Are they same? If so, why are we writing the former expression? Aren't lazy mode more expensive performance wise? If not, can you tell the difference? There is not a difference between [0-9]{2} and [0-9]{2}? . The difference between greedy matching and lazy matching (the addition of a ? ) has to do with backtracking. Regular expression engines are built to match text (from left to right). Therefore it is logical that when you ask an expression to match a range of character(s), it matches as many as

remove text between delimiters, multiple times on each line

阅读更多关于 remove text between delimiters, multiple times on each line

问题 I need to remove text between the delimiters "<" and ">", but there are multiple instances of these on each line of my text file. For example, I want to turn this: person 1, person 2<email2@mail.com>, person 3<email3@mail.com>, person 4<email4@mail.com>` Into this: person 1, person 2, person 3, person 4 I've tried to use a few things, including sed: sed -e 's/<.*>//' filename.csv but this removes everything between the first < and the last > giving the result person 1, person 2 . 回答1: you can