What's the fastest way to check if a word from one string is in another string?

后端 未结 8 925
故里飘歌
故里飘歌 2020-12-30 12:56

I have a string of words; let\'s call them bad:

bad = \"foo bar baz\"

I can keep this string as a whitespace separated string,

8条回答
  •  慢半拍i
    慢半拍i (楼主)
    2020-12-30 13:18

    If the list of bad words gets huge, a hash is a lot faster:

        require 'benchmark'
    
        bad = ('aaa'..'zzz').to_a    # 17576 words
        str= "What's the fasted way to check if any word from the bad string is within my "
        str += "comparison string, and what's the fastest way to remove said word if it's "
        str += "found" 
        str *= 10
    
        badex = /\b(#{bad.join('|')})\b/i
    
        bad_hash = {}
        bad.each{|w| bad_hash[w] = true}
    
        n = 10
        Benchmark.bm(10) do |x|
    
          x.report('regex:') {n.times do 
            str.gsub(badex,'').squeeze(' ')
          end}
    
          x.report('hash:') {n.times do
            str.gsub(/\b\w+\b/){|word| bad_hash[word] ? '': word}.squeeze(' ')
          end}
    
        end
                    user     system      total        real
    regex:     10.485000   0.000000  10.485000 ( 13.312500)
    hash:       0.000000   0.000000   0.000000 (  0.000000)
    

提交回复
热议问题