regex-greedy

Sed seems to replace only the last occurrence in global string substitution

风格不统一 提交于 2019-12-08 00:01:41
问题 I use this command, but it's not working ad intended: echo "0+223+141+800+450+1*(106+400)+1*(1822+500)+1*(183+400)" | sed 's/\*\(.*\)+/*\1suma/g' This is the expected output: 0+223+141+800+450+1*(106suma400)+1*(1822suma500)+1*(183suma400) but this is what I get: 0+223+141+800+450+1*(106+400)+1*(1822+500)+1*(183suma400) It looks like only the last occurrence is being replaced, despite the use of g . 回答1: Try the following: echo "0+223+141+800+450+1*(106+400)+1*(1822+500)+1*(183+400)" | sed 's/

RegEx for HTML tag conversion

可紊 提交于 2019-12-07 12:29:50
问题 For some reasons, I want to convert strings which contain <p style=“text-align:center; others-style:value;”>Content</p> to <center>Content</center> in PHP. The text-align values can be either left, right, or center. And when there are other stylings, I want to omit them. How can I do that in PHP? Edit: Maybe I was not clear enough in my original question. What I mean is that I want to convert contents with text-align:center to be wrapped by <center> , and contents with text-align:right to be

Python non-greedy regex to clean xml

烈酒焚心 提交于 2019-12-06 07:50:32
I have an 'xml file' file that has some unwanted characters in it <data> <tag>blar </tag><tagTwo> bo </tagTwo> some extra characters not enclosed that I want to remove <anothertag>bbb</anothertag> </data> I thought the following non-greedy substitution would remove the characters that were not properly encased in <sometag></sometag> re.sub("</([a-zA-Z]+)>.*?<","</\\1><",text) ^ ^ ^ ^ text is the xml txt. remember tag, | | put tag back without and reopen next tag read everything until the next '<' (non-gready) This regex seems only to find the position indicated with the [[]] in </tag>[[]]

Regex to match whatsapp chat log

扶醉桌前 提交于 2019-12-06 07:20:58
I've been trying to create Regex for WhatsApp chat log. So far I've been able to achieve this Click Here for the test link By creating the following Regex: (?P<datetime>\d{2}\/\d{2}\/\d{4},\s\d(?:\d)?:\d{2} [pa].m.)\s-\s(?P<name>[^:]*):(?P<message>.*) The problem with this regex is, it is not able to match big messages which span multiple lines with line breaks. You can see the issue in the link provided above. Help would be appreciated. Thank you. There you go: ^ (?P<datetime>\d{2}/\d{2}/\d{4}[^-]+)\s+-\s+ (?P<name>[^:]+):\s+ (?P<message>[\s\S]+?) (?=^\d{2}|\Z) See your modified demo on

Non greedy regex

点点圈 提交于 2019-12-06 06:22:05
I need to get the value inside some tags in a comment php file like this php code /* this is a comment !- <titulo>titulo3</titulo> <funcion> <descripcion>esta es la descripcion de la funcion 6</descripcion> </funcion> <funcion> <descripcion>esta es la descripcion de la funcion 7</descripcion> </funcion> <otros> <descripcion>comentario de otros 2a hoja</descripcion> </otros> -! */ some php code so as you can see the file has newlines and repetions of tags like <funcion></funcion> and i need to get every single one of the tags, so i was trying something like this: preg_match_all("/(<funcion>)(.*

Regex is behaving lazy, should be greedy

冷暖自知 提交于 2019-12-05 17:44:05
问题 I thought that by default my Regex would exhibit the greedy behavior that I want, but it is not in the following code: Regex keywords = new Regex(@"in|int|into|internal|interface"); var targets = keywords.ToString().Split('|'); foreach (string t in targets) { Match match = keywords.Match(t); Console.WriteLine("Matched {0,-9} with {1}", t, match.Value); } Output: Matched in with in Matched int with in Matched into with in Matched internal with in Matched interface with in Now I realize that I

Regular expression greedy match not working as expected

柔情痞子 提交于 2019-12-05 16:22:15
I have a very basic regular expression that I just can't figure out why it's not working so the question is two parts. Why does my current version not work and what is the correct expression. Rules are pretty simple: Must have minimum 3 characters. If a % character is the first character must be a minimum of 4 characters. So the following cases should work out as follows: AB - fail ABC - pass ABCDEFG - pass % - fail %AB - fail %ABC - pass %ABCDEFG - pass %%AB - pass The expression I am using is: ^%?\S{3} Which to me means: ^ - Start of string %? - Greedy check for 0 or 1 % character \S{3} - 3

Stackoverflow in pattern matching in java

孤人 提交于 2019-12-04 23:21:41
I tried to split a line based on spaces not enclosed between double quotes. My regex is (([\"]([^\\\"]|\\.)+[\"]|[^ ]+))+ My Code Pattern regex = Pattern.compile("(([\"]([^\\\"]|\\.)+[\"]|[^ ]+))+"); Matcher regexMatcher = regex.matcher(line); List<String> rule = new ArrayList<String>(); while(regexMatcher.find()) rule.add(regexMatcher.group()); Input for which it is failed. SecRule REQUEST_COOKIES|!REQUEST_COOKIES:/__utm/|REQUEST_COOKIES_NAMES|ARGS_NAMES|ARGS|XML:/* "(?i:\b(?:(?:s(?:t(?:d(?:dev(_pop|_samp)?)?|r(?:_to_date|cmp))|u(?:b(?:str(?:ing(_index)?)?|(?:dat|tim)e)|m)|e(?:c(?:_to_time

java - Regex to split a string using spaces but not considering double quotes or single quotes

两盒软妹~` 提交于 2019-12-04 15:36:48
I want to split a string using spaces but not considering double quotes or single quotes. I tried using Regex for splitting a string using space when not surrounded by single or double quotes but it failed in some cases. Input : It is a "beautiful day"'but i' cannot "see it" and the output should be It is a "beautiful day"'but i' cannot "see it" The regex in above link resulted in It is a "beautiful day" 'but i' cannot "see it" I want "beautiful day"'but i' in the one line. Can somebody help me in writing the correct regex? This regex passes your test: " (?=(([^'\"]*['\"]){2})*[^'\"]*$)" It's

Greediness behaving differently in JavaScript?

点点圈 提交于 2019-12-04 08:56:03
问题 There was this question which made me realise that greediness of quantifiers is not always the same in certain regex engines. Taking the regex from that question and modifying it a bit: !\[(.*?)*\] (I know that * is redundant here, but I found what's following to be quite an interesting behaviour). And if we try to match against: ![][][] I expected to get the first capture group to be empty, because (.*?) is lazy and will stop at the first ] it comes across. This is indeed what happens in: