non-greedy

Non-greedy Regular Expression in Java

十年热恋 提交于 2019-12-09 15:17:47
问题 I have next code: public static void createTokens(){ String test = "test is a word word word word big small"; Matcher mtch = Pattern.compile("test is a (\\s*.+?\\s*) word (\\s*.+?\\s*)").matcher(test); while (mtch.find()){ for (int i = 1; i <= mtch.groupCount(); i++){ System.out.println(mtch.group(i)); } } } And have next output: word w But in my opinion it must be: word word Somebody please explain me why so? 回答1: Because your patterns are non-greedy, so they matched as little text as

Python non-greedy regex to clean xml

烈酒焚心 提交于 2019-12-06 07:50:32
I have an 'xml file' file that has some unwanted characters in it <data> <tag>blar </tag><tagTwo> bo </tagTwo> some extra characters not enclosed that I want to remove <anothertag>bbb</anothertag> </data> I thought the following non-greedy substitution would remove the characters that were not properly encased in <sometag></sometag> re.sub("</([a-zA-Z]+)>.*?<","</\\1><",text) ^ ^ ^ ^ text is the xml txt. remember tag, | | put tag back without and reopen next tag read everything until the next '<' (non-gready) This regex seems only to find the position indicated with the [[]] in </tag>[[]]

Non greedy regex

点点圈 提交于 2019-12-06 06:22:05
I need to get the value inside some tags in a comment php file like this php code /* this is a comment !- <titulo>titulo3</titulo> <funcion> <descripcion>esta es la descripcion de la funcion 6</descripcion> </funcion> <funcion> <descripcion>esta es la descripcion de la funcion 7</descripcion> </funcion> <otros> <descripcion>comentario de otros 2a hoja</descripcion> </otros> -! */ some php code so as you can see the file has newlines and repetions of tags like <funcion></funcion> and i need to get every single one of the tags, so i was trying something like this: preg_match_all("/(<funcion>)(.*

Regex is behaving lazy, should be greedy

冷暖自知 提交于 2019-12-05 17:44:05
问题 I thought that by default my Regex would exhibit the greedy behavior that I want, but it is not in the following code: Regex keywords = new Regex(@"in|int|into|internal|interface"); var targets = keywords.ToString().Split('|'); foreach (string t in targets) { Match match = keywords.Match(t); Console.WriteLine("Matched {0,-9} with {1}", t, match.Value); } Output: Matched in with in Matched int with in Matched into with in Matched internal with in Matched interface with in Now I realize that I

Regular expression in regards to question mark “lazy” mode

好久不见. 提交于 2019-12-04 05:48:08
问题 I understand the ? mark here means "lazy". My question essentially is [0-9]{2}? vs [0-9]{2} Are they same? If so, why are we writing the former expression? Aren't lazy mode more expensive performance wise? If not, can you tell the difference? 回答1: There is not a difference between [0-9]{2} and [0-9]{2}? . The difference between greedy matching and lazy matching (the addition of a ? ) has to do with backtracking. Regular expression engines are built to match text (from left to right).

Non-greedy regex quantifier gives greedy result

僤鯓⒐⒋嵵緔 提交于 2019-12-04 03:20:44
问题 I have a .net regex which I am testing using Windows Powershell. The output is as follows: > [System.Text.RegularExpressions.Regex]::Match("aaa aaa bbb", "aaa.*?bbb") Groups : {aaa aaa bbb} Success : True Captures : {aaa aaa bbb} Index : 0 Length : 11 Value : aaa aaa bbb My expectation was that using the ? quantifier would cause the match to be aaa bbb , as the second group of a's is sufficient to satisfy the expression. Is my understanding of non-greedy quantifiers flawed, or am I testing

How to non-greedy multiple lookbehind matches

我只是一个虾纸丫 提交于 2019-12-03 18:00:25
问题 Source: <prefix><content1><suffix1><prefix><content2><suffix2> Engine: PCRE RegEx1: (?<=<prefix>)(.*)(?=<suffix1>) RegEx2: (?<=<prefix>)(.*)(?=<suffix2>) Result1: <content1> Result2: <content1><suffix1><prefix><content2> The desired result for RegEx2 is just <content2> but it is obviously greedy. How do I make RegEx2 non-greedy and use only the last matching lookbehind? [I hope I have translated this correctly from the NoteTab syntax. I don't do much RegEx coding. The <prefix>, <content> &

std::regex_match and lazy quantifier with strange behavior

旧街凉风 提交于 2019-12-02 11:02:05
I know that: Lazy quantifier matches: As Few As Possible (shortest match) Also know that the constructor: basic_regex( ..., flag_type f = std::regex_constants::ECMAScript ); And: ECMAScript supports non-greedy matches, and the ECMAScript regex "<tag[^>]*>.*?</tag>" would match only until the first closing tag ... en.cppreference And: At most one grammar option must be chosen out of ECMAScript , basic , extended , awk , grep , egrep . If no grammar is chosen, ECMAScript is assumed to be selected ... en.cppreference And: Note that regex_match will only successfully match a regular expression to

Regular expression in regards to question mark “lazy” mode

大城市里の小女人 提交于 2019-12-02 09:34:00
I understand the ? mark here means "lazy". My question essentially is [0-9]{2}? vs [0-9]{2} Are they same? If so, why are we writing the former expression? Aren't lazy mode more expensive performance wise? If not, can you tell the difference? There is not a difference between [0-9]{2} and [0-9]{2}? . The difference between greedy matching and lazy matching (the addition of a ? ) has to do with backtracking. Regular expression engines are built to match text (from left to right). Therefore it is logical that when you ask an expression to match a range of character(s), it matches as many as

remove text between delimiters, multiple times on each line

邮差的信 提交于 2019-12-02 03:14:41
问题 I need to remove text between the delimiters "<" and ">", but there are multiple instances of these on each line of my text file. For example, I want to turn this: person 1, person 2<email2@mail.com>, person 3<email3@mail.com>, person 4<email4@mail.com>` Into this: person 1, person 2, person 3, person 4 I've tried to use a few things, including sed: sed -e 's/<.*>//' filename.csv but this removes everything between the first < and the last > giving the result person 1, person 2 . 回答1: you can