non-greedy

Python non-greedy regex to clean xml

情到浓时终转凉″ 提交于 2019-12-22 13:56:10
问题 I have an 'xml file' file that has some unwanted characters in it <data> <tag>blar </tag><tagTwo> bo </tagTwo> some extra characters not enclosed that I want to remove <anothertag>bbb</anothertag> </data> I thought the following non-greedy substitution would remove the characters that were not properly encased in <sometag></sometag> re.sub("</([a-zA-Z]+)>.*?<","</\\1><",text) ^ ^ ^ ^ text is the xml txt. remember tag, | | put tag back without and reopen next tag read everything until the next

lookahead in kate for patterns

删除回忆录丶 提交于 2019-12-21 22:54:23
问题 I'm working on compiling a table of cases for a legal book. I've converted it to HTML so I can use the tags for search and replace operations, and I'm currently working in Kate. The text refers to the names of cases and the citations for the cases are in the footnotes, e.g. <i>Smith v Jones</i>127 ......... [other stuff including newline characters].......</br>127 (1937) 173 ER 406; I've been able to get lookahead working in Kate, using: <i>.*</i>([0-9]{1,4}) .+<br/>\1 .*<br/> ...but I've run

std::regex_match and lazy quantifier with strange behavior

↘锁芯ラ 提交于 2019-12-20 06:28:41
问题 I know that: Lazy quantifier matches: As Few As Possible (shortest match) Also know that the constructor: basic_regex( ..., flag_type f = std::regex_constants::ECMAScript ); And: ECMAScript supports non-greedy matches, and the ECMAScript regex "<tag[^>]*>.*?</tag>" would match only until the first closing tag ... en.cppreference And: At most one grammar option must be chosen out of ECMAScript , basic , extended , awk , grep , egrep . If no grammar is chosen, ECMAScript is assumed to be

non-greedy matching in Scala RegexParsers

半腔热情 提交于 2019-12-19 05:21:47
问题 Suppose I'm writing a rudimentary SQL parser in Scala. I have the following: class Arith extends RegexParsers { def selectstatement: Parser[Any] = selectclause ~ fromclause def selectclause: Parser[Any] = "(?i)SELECT".r ~ tokens def fromclause: Parser[Any] = "(?i)FROM".r ~ tokens def tokens: Parser[Any] = rep(token) //how to make this non-greedy? def token: Parser[Any] = "(\\s*)\\w+(\\s*)".r } When trying to match selectstatement against SELECT foo FROM bar , how do I prevent the selectclause

How can I get a list of all possible matches in between non-greedy and greedy

给你一囗甜甜゛ 提交于 2019-12-12 19:57:43
问题 I have the string "I like lettuce and carrots and onions" in Python. I thought I could get the following matches ["I like lettuce", "I like lettuce and carrots", "I like lettuce and carrots and onions"] by using a regex like .* and . (The regex should match any character up to " and".) However, using the greedy version ( .* and ) gives me only the last match, and using the non-greedy version ( .*? and ) gives me only the first match. How can I get all three matches? (I do not need a regex

why is this regular expression returning only one match?

不羁岁月 提交于 2019-12-12 05:13:12
问题 Here is my input: xxx999xxx888xxx777xxx666yyy xxx222xxx333xxx444xxx555yyy This is the expression: xxx.*xxx(?<matchString>(.(?!xxx.*xxx))*?)xxx.*yyy It's returning 444 . I'd like it to return both 444 and 777, but I can't get anywhere with this. I have the ! exclusion so that it matches only the innermost on the left side (which works great when I am searching for only one result, which is most of the time). However, I have a feeling that that is related to why it is skipping the first result

Java Regexp: UNGREEDY flag

℡╲_俬逩灬. 提交于 2019-12-12 04:54:15
问题 I'd like to port a generic text processing tool, Texy!, from PHP to Java. This tool does ungreedy matching everywhere, using preg_match_all("/.../U") . So I am looking for a library, which has some UNGREEDY flag. I know I could use the .*? syntax, but there are really many regular expressions I would have to overwrite, and check them with every updated version. I've checked ORO - seems to be abandoned Jakarta Regexp - no support java.util.regex - no support Is there any such library? Thanks,

Understanding regex pattern used to find string between strings in html

那年仲夏 提交于 2019-12-12 00:27:28
问题 I have the following html file: <!-- <div class="_5ay5"><table class="uiGrid _51mz" cellspacing="0" cellpadding="0"><tbody><tr class="_51mx"><td class="_51m-"><div class="_u3y"><div class="_5asl"><a class="_47hq _5asm" href="/Dev/videos/1610110089242029/" aria-label="Who said it?" ajaxify="/Dev/videos/1610110089242029/" rel="theater"> In order to pull the string of numbers between videos/ and /" , I'm using the following method that I found: import re Source_file = open('source.html').read()

Sed replace at second occurrence

感情迁移 提交于 2019-12-10 14:36:39
问题 I want to remove a pattern with sed, only at second occurence. Here is what I want, remove a pattern but on second occurrence. What's in the file.csv: a,Name(null)abc.csv,c,d,Name(null)abc.csv,f a,Name(null)acb.csv,c,d,Name(null)acb.csv,f a,Name(null)cba.csv,c,d,Name(null)cba.csv,f Output wanted: a,Name(null)abc.csv,c,d,Name,f a,Name(null)acb.csv,c,d,Name,f a,Name(null)cba.csv,c,d,Name,f This is what i tried: sed -r 's/(\(null)\).*csv//' file.csv The problem here is that the regex is too

Regex: Is Lazy Worse?

喜欢而已 提交于 2019-12-10 01:23:39
问题 I have always written regexes like this <A HREF="([^"]*)" TARGET="_blank">([^<]*)</A> but I just learned about this lazy thing and that I can write it like this <A HREF="(.*?)" TARGET="_blank">(.*?)</A> is there any disadvantage to using this second approach? The regex is definitely more compact (even SO parses it better). Edit : There are two best answers here, which point out two important differences between the expressions. ysth's answer points to a weakness in the non-greedy/lazy one, in