How to write a method that counts the most common substring in a string in ruby?

前端未结

关注

 2  1699

广开言路 2021-01-23 13:34

I have this program with a class DNA. The program counts the most frequent k-mer in a string. So, it is looking for the most common substring in a string with a length of k.

2条回答

暖寄归人 (楼主)

2021-01-23 13:58

Something like this?

  require 'set'

  def count_kmer(k)
    max_kmers = kmers(k)
                    .each_with_object(Hash.new(0)) { |value, count| count[value] += 1 }
                    .group_by { |_,v| v }
                    .max
    [Set.new(max_kmers[1].map { |e| e[0] }), max_kmers[0]]
  end

  def kmers(k)
    nucleotide.chars.each_cons(k).map(&:join)
  end

EDIT: Here's the full text of the class:

require 'set'

class DNA
  def initialize (nucleotide)
    @nucleotide = nucleotide
  end

  def length
    @nucleotide.length
  end

  def count_kmer(k)
    max_kmers = kmers(k)
                    .each_with_object(Hash.new(0)) { |value, count| count[value] += 1 }
                    .group_by { |_,v| v }
                    .max
    [Set.new(max_kmers[1].map { |e| e[0] }), max_kmers[0]]
  end

  def kmers(k)
    nucleotide.chars.each_cons(k).map(&:join)
  end

  protected
  attr_reader :nucleotide
end

This produces the following output, using Ruby 2.2.1, using the class and method you specified:

>> dna1 = DNA.new('AACCAATCCG')
=> #
>> dna1.count_kmer(1)
=> [#, 4]
>> dna1.count_kmer(2)
=> [#, 2]

As a bonus, you can also do:

>> dna1.kmers(2)
=> ["AA", "AC", "CC", "CA", "AA", "AT", "TC", "CC", "CG"]

0 讨论(0)

查看其它2个回答