Fastest way to check if a string matches a regexp in ruby?

前端 未结 7 1508
独厮守ぢ
独厮守ぢ 2020-12-13 05:07

What is the fastest way to check if a string matches a regular expression in Ruby?

My problem is that I have to \"egrep\" through a huge list of strings to find whic

7条回答
  •  刺人心
    刺人心 (楼主)
    2020-12-13 05:57

    This is the benchmark I have run after finding some articles around the net.

    With 2.4.0 the winner is re.match?(str) (as suggested by @wiktor-stribiżew), on previous versions, re =~ str seems to be fastest, although str =~ re is almost as fast.

    #!/usr/bin/env ruby
    require 'benchmark'
    
    str = "aacaabc"
    re = Regexp.new('a+b').freeze
    
    N = 4_000_000
    
    Benchmark.bm do |b|
        b.report("str.match re\t") { N.times { str.match re } }
        b.report("str =~ re\t")    { N.times { str =~ re } }
        b.report("str[re]  \t")    { N.times { str[re] } }
        b.report("re =~ str\t")    { N.times { re =~ str } }
        b.report("re.match str\t") { N.times { re.match str } }
        if re.respond_to?(:match?)
            b.report("re.match? str\t") { N.times { re.match? str } }
        end
    end
    

    Results MRI 1.9.3-o551:

    $ ./bench-re.rb  | sort -t $'\t' -k 2
           user     system      total        real
    re =~ str         2.390000   0.000000   2.390000 (  2.397331)
    str =~ re         2.450000   0.000000   2.450000 (  2.446893)
    str[re]           2.940000   0.010000   2.950000 (  2.941666)
    re.match str      3.620000   0.000000   3.620000 (  3.619922)
    str.match re      4.180000   0.000000   4.180000 (  4.180083)
    

    Results MRI 2.1.5:

    $ ./bench-re.rb  | sort -t $'\t' -k 2
           user     system      total        real
    re =~ str         1.150000   0.000000   1.150000 (  1.144880)
    str =~ re         1.160000   0.000000   1.160000 (  1.150691)
    str[re]           1.330000   0.000000   1.330000 (  1.337064)
    re.match str      2.250000   0.000000   2.250000 (  2.255142)
    str.match re      2.270000   0.000000   2.270000 (  2.270948)
    

    Results MRI 2.3.3 (there is a regression in regex matching, it seems):

    $ ./bench-re.rb  | sort -t $'\t' -k 2
           user     system      total        real
    re =~ str         3.540000   0.000000   3.540000 (  3.535881)
    str =~ re         3.560000   0.000000   3.560000 (  3.560657)
    str[re]           4.300000   0.000000   4.300000 (  4.299403)
    re.match str      5.210000   0.010000   5.220000 (  5.213041)
    str.match re      6.000000   0.000000   6.000000 (  6.000465)
    

    Results MRI 2.4.0:

    $ ./bench-re.rb  | sort -t $'\t' -k 2
           user     system      total        real
    re.match? str     0.690000   0.010000   0.700000 (  0.682934)
    re =~ str         1.040000   0.000000   1.040000 (  1.035863)
    str =~ re         1.040000   0.000000   1.040000 (  1.042963)
    str[re]           1.340000   0.000000   1.340000 (  1.339704)
    re.match str      2.040000   0.000000   2.040000 (  2.046464)
    str.match re      2.180000   0.000000   2.180000 (  2.174691)
    

提交回复
热议问题