regex for n characters or at least m characters

北慕城南 提交于 2019-12-21 03:13:14

问题


This should be a pretty simple regex question but I couldn't find any answers anywhere. How would one make a regex, which matches on either ONLY 2 characters, or at least 4 characters. Here is my current method of doing it (ignore the regex itself, that's besides the point):

[A-Za-z0_9_]{2}|[A-Za-z0_9_]{4,}

However, this method takes twice the time (and is approximately 0.3s slower for me on a 400 line file), so I was wondering if there was a better way to do it?


回答1:


Optimize the beginning, and anchor it.

^[A-Za-z0-9_]{2}(?:|[A-Za-z0-9_]{2,})$

(Also, you did say to ignore the regex itself, but I guessed you probably wanted 0-9, not 0_9)

EDIT Hm, I was sure I read that you want to match lines. Remove the anchors (^$) if you want to match inside the line as well. If you do match full lines only, anchors will speed you up (well, the front anchor ^ will, at least).




回答2:


Your solution looks pretty good. As an alternative you can try smth like that:

[A-Za-z0-9_]{2}(?:[A-Za-z0-9_]{2,})?

Btw, I think you want hyphen instead of underscore between 0 and 9, don't you?




回答3:


The solution you present is correct.

If you're trying to optimize the routine, and the number of matches strings matching 2 or more characters is much smaller than those that do not, consider accepting all strings of length 2 or greater, then tossing those if they're of length 3. This may boost performance by only checking the regex once, and the second call need not even be a regular expression; checking a string length is usually an extremely fast operation.

As always, you really need to run tests on real-world data to verify if this would give you a speed increase.




回答4:


so basically you want to match words of length either 2 or 2+2+N, N>=0

([A-Za-z0-9][A-Za-z0-9](?:[A-Za-z0-9][A0Za-z0-9])*)

working example:

#!/usr/bin/perl

while (<STDIN>)
{
    chomp;
    my @matches = ($_=~/([A-Za-z0-9][A-Za-z0-9](?:[A-Za-z0-9][A0Za-z0-9])*)/g);
    for my $m (@matches) {
        print "match: $m\n";
    }
}

input file:

cat in.txt
ab abc bcad a as asdfa
aboioioi i i abc bcad a as asdfa

output:

perl t.pl <in.txt
match: ab
match: ab
match: bcad
match: as
match: asdf
match: aboioioi
match: ab
match: bcad
match: as
match: asdf


来源:https://stackoverflow.com/questions/8608760/regex-for-n-characters-or-at-least-m-characters

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!