Regex match before string comes up

不羁的心 提交于 2019-12-13 08:49:11

问题


So i have this file of 10,000+ lines of messages from a game server, like so:

11.07.23 08:40:16 [INFO] NC: Moving violation: wolfman98 from yasmp (-90.8, 64.0, 167.5) to (-90.7, 64.0, 167.3) distance (0.0, 0.0, 0.2)

11.07.23 10:57:44 [INFO] NC: Moving violation: AKxiZeroDark from yasmp (-1228.3, 11.2, 1098.7) to (-1228.3, 11.2, 1098.7) distance (0.0, 0.0, 0.0)

The current regex code i have is: \d{1,4}\.\d{1}, which matches so far everything in bold:

11.07.23 08:40:16 [INFO] NC: Moving violation: wolfman98 from yasmp (-90.8, 64.0, 167.5) to (-90.7, 64.0, 167.3) distance (0.0, 0.0, 0.2)

Ive been having trouble finding a way to get the part that only says:

(-1228.3, 11.2, 1098.7) to (-1228.3, 11.2, 1098.7)

before the "distance" word, and without the timestamp in the beginning, and eventually replacing it to end up like this:

11.07.23 08:40:16 [INFO] NC: Moving violation: wolfman98 from yasmp (-#, #, #) to (-#, #, #) distance (0.0, 0.0, 0.2)

11.07.23 10:57:44 [INFO] NC: Moving violation: AKxiZeroDark from yasmp (-#, #, #) to (-#, #, #) distance (0.0, 0.0, 0.0)

And a bit of extra information, the numbers can be either negative or not, ranging from 1.0 digit to 1234.0 digits, which is why i need help matching before the word "distance" again.

EDIT: Or even, it would be fine if the entire thing didnt show up:

11.07.23 08:40:16 [INFO] NC: Moving violation: wolfman98 from yasmp distance (0.0, 0.0, 0.2)

11.07.23 10:57:44 [INFO] NC: Moving violation: AKxiZeroDark from yasmp distance (0.0, 0.0, 0.0)


回答1:


A fairly hairy looking regex that extends your number matching regex would be \((?:-?\d{1,4}\.\d{1}(?:, |\))){3} to \((?:-?\d{1,4}\.\d{1}(?:, |\))){3}(?= distance). Let's break that down a little.

It is made up of two groups that are identical to match the two groups of numbers in parens: \((?:-?\d{1,4}\.\d{1}(?:, |\))){3}. The regex now allows an optional - before the number and which makes the number match -?\d{1,4}\.\d{1}. After each number there is either a comma or a paren, so to iterate the number match we need that as well: (?:, |\)). That entire beast is then prefixed with \( to get the opening paren of the number group. That regex is repeated twice to get the two groups of numbers with the to match in-between.

The final bit is a positive look-ahead to ensure that we are matching the number groups that are followed by the word distance. That word will not be included in the match, but will have to be there for the regex to match.

I've used non-capturing groups (the (?: ... ) stuff) because I don't know what you want to do with the captures.

I've tried this out against your two example logfile lines using perl 5.12.2 and it seems to work.




回答2:


You will want to match from the start of the ( that opens the sequence, to the end of the ) before distance.

A not-checked, may-be-too-broad regexp could be: \([-0-9., ]+\) to \([-0-9., ]+\) but that may match things you don't want.




回答3:


/(?:\-|\b)\d{1,4}.\d{1}\b(?=.*distance)/

Matches the numbers you want (tested in PHP).




回答4:


Sounds like a job for perl:

use strict;
use warnings;
use ARGV::readonly;

my $rx = qr/\([0-9,\.\- ]+\)/;

while (<>) {
    s/ $rx to $rx( distance $rx\s*)$/$1/;
    print;
}

Usage: script.pl input.txt > output.txt

Or as a one-liner with simpler regexes. Just remove the first two parens, whatever they contain:

perl -pwe 's/ \([^)]+\)//; s/ \([^)]+\)//;' input.txt 


来源:https://stackoverflow.com/questions/7049245/regex-match-before-string-comes-up

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!