I have this program with a class DNA. The program counts the most frequent k-mer in a string. So, it is looking for the most common substring in a string with a length of k.
Something like this?
require 'set'
def count_kmer(k)
max_kmers = kmers(k)
.each_with_object(Hash.new(0)) { |value, count| count[value] += 1 }
.group_by { |_,v| v }
.max
[Set.new(max_kmers[1].map { |e| e[0] }), max_kmers[0]]
end
def kmers(k)
nucleotide.chars.each_cons(k).map(&:join)
end
EDIT: Here's the full text of the class:
require 'set'
class DNA
def initialize (nucleotide)
@nucleotide = nucleotide
end
def length
@nucleotide.length
end
def count_kmer(k)
max_kmers = kmers(k)
.each_with_object(Hash.new(0)) { |value, count| count[value] += 1 }
.group_by { |_,v| v }
.max
[Set.new(max_kmers[1].map { |e| e[0] }), max_kmers[0]]
end
def kmers(k)
nucleotide.chars.each_cons(k).map(&:join)
end
protected
attr_reader :nucleotide
end
This produces the following output, using Ruby 2.2.1, using the class and method you specified:
>> dna1 = DNA.new('AACCAATCCG')
=> #
>> dna1.count_kmer(1)
=> [#, 4]
>> dna1.count_kmer(2)
=> [#, 2]
As a bonus, you can also do:
>> dna1.kmers(2)
=> ["AA", "AC", "CC", "CA", "AA", "AT", "TC", "CC", "CG"]