Find out which words in a large list occur in a small string

后端 未结 3 925
暖寄归人
暖寄归人 2021-01-19 16:47

I have a static \'large\' list of words, about 300-500 words, called \'list1\'

given a relatively short string str of about 40 words, what is the fastes

3条回答
  •  暗喜
    暗喜 (楼主)
    2021-01-19 16:51

    Here's my shot at it:

    def match_freq(exprs, strings)
      rs, ss, f = exprs.split.map{|x|Regexp.new(x)}, strings.split, {}
      rs.each{|r| ss.each{|s| f[r] = f[r] ? f[r]+1 : 1 if s=~r}}
      [f.values.inject(0){|a,x|a+x}, f, f.size]
    end
    
    list1 = "fred sam sandy jack sue bill"
    str = "and so sammy went with jack to see fred and freddie"
    x = match_freq(list1, str)
    x # => [4, {/sam/=>1, /fred/=>2, /jack/=>1}, 3]
    

    The output of "match_freq" is an array of your output items (a,b,c). The algorithm itself is O(n*m) where n is the number of items in list1 and m is the size of the input string, I don't think you can do better than that (in terms of big-oh). But there are smaller optimizations that might pay off like keeping a separate counter for the total number of matches instead of computing it afterwards. This was just my quick hack at it.

    You can extract just the matching words from the output as follows:

    matches = x[1].keys.map{|x|x.source}.join(" ") # => "sam fred jack"
    

    Note that the order won't be preserved necessarily, if that's important you'll have to keep a separate list of the order they were found.

提交回复
热议问题