How to write a method that counts the most common substring in a string in ruby?

前端 未结 2 1699
广开言路
广开言路 2021-01-23 13:34

I have this program with a class DNA. The program counts the most frequent k-mer in a string. So, it is looking for the most common substring in a string with a length of k.

2条回答
  •  暖寄归人
    2021-01-23 13:58

    Something like this?

      require 'set'
    
      def count_kmer(k)
        max_kmers = kmers(k)
                        .each_with_object(Hash.new(0)) { |value, count| count[value] += 1 }
                        .group_by { |_,v| v }
                        .max
        [Set.new(max_kmers[1].map { |e| e[0] }), max_kmers[0]]
      end
    
      def kmers(k)
        nucleotide.chars.each_cons(k).map(&:join)
      end
    

    EDIT: Here's the full text of the class:

    require 'set'
    
    class DNA
      def initialize (nucleotide)
        @nucleotide = nucleotide
      end
    
      def length
        @nucleotide.length
      end
    
      def count_kmer(k)
        max_kmers = kmers(k)
                        .each_with_object(Hash.new(0)) { |value, count| count[value] += 1 }
                        .group_by { |_,v| v }
                        .max
        [Set.new(max_kmers[1].map { |e| e[0] }), max_kmers[0]]
      end
    
      def kmers(k)
        nucleotide.chars.each_cons(k).map(&:join)
      end
    
      protected
      attr_reader :nucleotide
    end
    

    This produces the following output, using Ruby 2.2.1, using the class and method you specified:

    >> dna1 = DNA.new('AACCAATCCG')
    => #
    >> dna1.count_kmer(1)
    => [#, 4]
    >> dna1.count_kmer(2)
    => [#, 2]
    

    As a bonus, you can also do:

    >> dna1.kmers(2)
    => ["AA", "AC", "CC", "CA", "AA", "AT", "TC", "CC", "CG"]
    

提交回复
热议问题