Use SLIM/HAML etc. in a Ruby script?

▼魔方 西西 提交于 2019-12-11 05:59:57

问题


I am currently making a script that analyses some genetic data and then produce the output on a coloured Word document. The script works, however, one method in the script is badly written, the method that creates the Word document.

The method creating the document creates a standalone HTML file, which is then saved with a 'docx' extension, which allows me to give different parts of the document different styles.

Below is the bare minimum to get this to work. It includes some sample input data which would be created in a different method just before the final step and stored in a hash, and the necessary methods.

require 'bio'

def make_hash(input_file)
  input_read = Hash.new
  biofastafile = Bio::FlatFile.open(Bio::FastaFormat, input_file) 
  biofastafile.each_entry do |entry|
    input_read[entry.definition] = entry.aaseq
  end
  return input_read
end

def to_doc(hash, output, motif)
  output_file = File.new(output, "w")
  output_file.puts "<!DOCTYPE html><html><head><style> .id{font-weight: bold;} .signalp{color:#000099; font-weight: bold;} .motif{color:#FF3300; font-weight: bold;} h3 {word-wrap: break-word;} p {word-wrap: break-word; font-family:Courier New, Courier, Mono;}</style></head><body>"
  hash.each do |id, seq|
    sequence = seq.to_s.gsub("\[\"", "").gsub("\"\]", "")
    id.scan(/(\w+)(.*)/) do |id_start, id_end|
      output_file.puts "<p><span class=\"id\"> >#{id_start}</span><span>#{id_end}</span><br>"
      output_file.puts "<span class=\"signalp\">"
      sequence.scan(/(\w+)-(\w+)/) do |signalp, seq_end|
        output_file.puts signalp + "</span>" + seq_end.gsub(/#{motif}/, '<span class="motif">\0</span>')
        output_file.puts "</p>"
      end
    end
  end
  output_file.puts "</body></html>"
  output_file.close   
end

hash = make_hash("./sample.txt")
to_doc = to_doc(hash, "output.docx", "WL|KK|RR|KR|R..R|R....R"

This is some sample data. In reality, when analysing the genetic data from a species, this can be made up of many 100,000's of sequences:

>isotig00001_f4_14 - Signal P Cleavage Site => 11:12
MMHLLCIVLLL-KWWLLL
>isotig00001_f4_15 - Signal P Cleavage Site => 10:11
MHLLCIVLLL-KWWLLL
>isotig00003_f6_8 - Signal P Cleavage Site => 11:12
MMHLLCIVLLL-KWWLLL
>isotig00003_f6_9 - Signal P Cleavage Site => 10:11
MHLLCIVLLL-KWWLLL
>isotig00004_f6_8 - Signal P Cleavage Site => 11:12
MMHLLCIVLLL-KWWLLL
>isotig00004_f6_9 - Signal P Cleavage Site => 10:11
MHLLCIVLLL-KWWLLL
>isotig00009_f2_3 - Signal P Cleavage Site => 22:23
MLKCFSIIMGLILLLEIGGGCA-IYFYRAQIQAQFQKSLTDVTITDYRENADFQDLIDALQSGLSCCGVNSYEDWDNNIYFNCSGPANNPEALWCAFLLLYTGSSKRSSQHPVRLWSSFPRTTKYFPHKDLHHWLCGYVYNVD
>isotig00009_f3_9 - Signal P Cleavage Site => 16:17
MKTGIIIFISTVVVLP-ITLKPCGVPFSCCIPDQASGVANTQCGYGVRSPEQQNTFHTKIYTTGCADMFTMWINRYLYYIAGIAGVIVLVELFGFCFAHSLINDIKRQKARWAHR
>isotig00009_f6_13 - Signal P Cleavage Site => 11:12
MMHLLCIVLLL-KWWLLL
>isotig00009_f6_14 - Signal P Cleavage Site => 10:11
MHLLCIVLLL-KWWLLL

Each read is made of two parts: The seq id (the line starting with a >) and the sequence. This is split, and stored in a hash in the make_hash method. This example:

>isotig00001_f4_14 - Signal P Cleavage Site => 11:12

MMHLLCIVLLL-KWWLLL 

Is made up of:

>isotig00001_f4_14  (the first part of the id - class="id")

Signal P Cleavage Site => 11:12 (the second part of the id - normal writing)

(new line)

MMHLLCIVLLL (first part of the sequence - class="signalp")

KW WL LL  (the second part of the sequence - the motif KW will be class="motif")

In HTML it would produce:

<p>
  <span class="id"> >isotig00001_f4_14</span><span>Signal P Cleavage Site => 11:12</span>
<br>
  <span class="signalp">MMHLLCIVLL</span><span>KW</span><span class="motif">KW</span><span>LL</span>

Basically, I would like to rewrite the to_doc method using a proper HTML templating script such as SLIM/HAML/NOKOGIRI/ERB. I have tried to get this done.

For some reason, a loop within a loop didn't work and creating an global variable to store these variables didn't work either.

The script above works, just save the sample data as "sample.txt" and then run the script.

I would be highly grateful for any help.


回答1:


Here's a starting point:

require 'haml'

haml_doc = <<EOT
%html
  %head
    :css
      .id {font-weight: bold;}
      .signalp {color:#000099; font-weight: bold;}
      .motif {color:#FF3300; font-weight: bold;}
      h3 {word-wrap: break-word;}
      p {word-wrap: break-word; font-family:Courier New, Courier, Mono;}
  %body
EOT

engine = Haml::Engine.new(haml_doc)
puts engine.render

Which outputs this when run:

<html>
  <head>
    <style>
      .id {font-weight: bold;}
      .signalp {color:#000099; font-weight: bold;}
      .motif {color:#FF3300; font-weight: bold;}
      h3 {word-wrap: break-word;}
      p {word-wrap: break-word; font-family:Courier New, Courier, Mono;}
    </style>
  </head>
  <body></body>
</html>

From there, you can easily write to a file using:

File.write(output, engine.render)

instead of using puts to output it to the console.

To use this, you need to flesh out the haml_doc with additional Haml to loop over your input data and massage it into an array or hash that you can iterate over cleanly, without embedding all sorts of scan and conditional logic. A view should be primarily used to output content, not manipulate data.

Just above the engine = Haml... line you'd want to read your input data and massage it, and store it in an instance variable that Haml can iterate over. You have the basic idea in your original code but instead of trying to output HTML, create an object or sub-hash that you can pass to Haml.

Normally this would all be separated into separate files for the model, the view and the controller, like in Rails or big Sinatra apps, but this really isn't a big app, so you can put it all in one file. Keep your logic clean and it'll be fine.

Without sample input data and an expected output it's hard to do more, but that'll give you a starting point.


Based on the data samples, here's something that gets in you the ballpark. I won't polish it because, after all, you have to do some of it, but this is a reasonable start. The first part is mocking up something reasonably like the Bio you reference in your code, but which I've never seen. You don't need this part, but might want to look through it:

module Bio

  FastaFormat = 1

SAMPLE_DATA = <<-EOT
>isotig00001_f4_14 - Signal P Cleavage Site => 11:12
MMHLLCIVLLL-KWWLLL
>isotig00001_f4_15 - Signal P Cleavage Site => 10:11
MHLLCIVLLL-KWWLLL
>isotig00003_f6_8 - Signal P Cleavage Site => 11:12
MMHLLCIVLLL-KWWLLL
>isotig00003_f6_9 - Signal P Cleavage Site => 10:11
MHLLCIVLLL-KWWLLL
>isotig00004_f6_8 - Signal P Cleavage Site => 11:12
MMHLLCIVLLL-KWWLLL
>isotig00004_f6_9 - Signal P Cleavage Site => 10:11
MHLLCIVLLL-KWWLLL
>isotig00009_f2_3 - Signal P Cleavage Site => 22:23
MLKCFSIIMGLILLLEIGGGCA-IYFYRAQIQAQFQKSLTDVTITDYRENADFQDLIDALQSGLSCCGVNSYEDWDNNIYFNCSGPANNPEALWCAFLLLYTGSSKRSSQHPVRLWSSFPRTTKYFPHKDLHHWLCGYVYNVD
>isotig00009_f3_9 - Signal P Cleavage Site => 16:17
MKTGIIIFISTVVVLP-ITLKPCGVPFSCCIPDQASGVANTQCGYGVRSPEQQNTFHTKIYTTGCADMFTMWINRYLYYIAGIAGVIVLVELFGFCFAHSLINDIKRQKARWAHR
>isotig00009_f6_13 - Signal P Cleavage Site => 11:12
MMHLLCIVLLL-KWWLLL
>isotig00009_f6_14 - Signal P Cleavage Site => 10:11
MHLLCIVLLL-KWWLLL
EOT

  class FlatFile

    class Entry
      attr_reader :definition, :aaseq

      def initialize(definition, aaseq)
        @definition = definition
        @aaseq = aaseq
      end
    end

    def initialize
    end

    def self.open(filetype, filename)
      SAMPLE_DATA.split("\n").each_slice(2).map{ |seq_id, sequence| Entry.new(seq_id, sequence) }
    end

    def each_entry
      @sample_data.each do |_entry|
        yield _entry
      end
    end

  end
end

Here's where the fun begins. I modified your get_hash routine to parse the strings how I'd do it. Instead of a hash, it returns an array of hashes. Each sub-hash is ready to be used, in other words, the data is parsed and ready to be output:

include Bio

def make_array_of_hashes(input_file)
  Bio::FlatFile.open(
    Bio::FastaFormat,
    input_file
  ).map { |entry|

    id_start, id_end = entry.definition.split('-').map(&:strip)
    signalp, seq_end = entry.aaseq.split('-')
    motif = seq_end.scan(/(?:WL|KK|RR|KR|R..R|R....R)/)

    {
      :id_start => id_start,
      :id_end => id_end,
      :signalp => signalp,
      :motif => motif
    }
  }
end

This is a simple way to define the HAML document inside the body of a script. I only output, there's no logic in the template except to loop. Everything else was handled prior to the view being processed:

haml_doc = <<EOT
!!!
%html
  %head
    :css
      .id {font-weight: bold;}
      .signalp {color:#000099; font-weight: bold;}
      .motif {color:#FF3300; font-weight: bold;}
      h3 {word-wrap: break-word;}
      p {word-wrap: break-word; font-family:Courier New, Courier, Mono;}
  %body
  - data.each do |d|
    %p
      %span.id= d[:id_start]
      %span= d[:id_end]
      %br/
      %span.signalp= d[:signalp]
      - d[:motif].each do |m|
        %span= m
EOT

And here's how to use it:

require 'haml'

data = make_array_of_hashes('sample.txt')

engine = Haml::Engine.new(haml_doc)
puts engine.render(Object.new, :data => data)

Which, when run outputs:

<!DOCTYPE html>
<html>
  <head>
    <style>
      .id {font-weight: bold;}
      .signalp {color:#000099; font-weight: bold;}
      .motif {color:#FF3300; font-weight: bold;}
      h3 {word-wrap: break-word;}
      p {word-wrap: break-word; font-family:Courier New, Courier, Mono;}
    </style>
  </head>
  <body></body>
  <p>
    <span class='id'>>isotig00001_f4_14</span>
    <span>Signal P Cleavage Site => 11:12</span>
    <br>
    <span class='signalp'>MMHLLCIVLLL</span>
    <span>WL</span>
  </p>
  <p>
    <span class='id'>>isotig00001_f4_15</span>
    <span>Signal P Cleavage Site => 10:11</span>
    <br>
    <span class='signalp'>MHLLCIVLLL</span>
    <span>WL</span>
  </p>
  <p>
    <span class='id'>>isotig00003_f6_8</span>
    <span>Signal P Cleavage Site => 11:12</span>
    <br>
    <span class='signalp'>MMHLLCIVLLL</span>
    <span>WL</span>
  </p>
  <p>
    <span class='id'>>isotig00003_f6_9</span>
    <span>Signal P Cleavage Site => 10:11</span>
    <br>
    <span class='signalp'>MHLLCIVLLL</span>
    <span>WL</span>
  </p>
  <p>
    <span class='id'>>isotig00004_f6_8</span>
    <span>Signal P Cleavage Site => 11:12</span>
    <br>
    <span class='signalp'>MMHLLCIVLLL</span>
    <span>WL</span>
  </p>
  <p>
    <span class='id'>>isotig00004_f6_9</span>
    <span>Signal P Cleavage Site => 10:11</span>
    <br>
    <span class='signalp'>MHLLCIVLLL</span>
    <span>WL</span>
  </p>
  <p>
    <span class='id'>>isotig00009_f2_3</span>
    <span>Signal P Cleavage Site => 22:23</span>
    <br>
    <span class='signalp'>MLKCFSIIMGLILLLEIGGGCA</span>
    <span>KR</span>
    <span>WL</span>
  </p>
  <p>
    <span class='id'>>isotig00009_f3_9</span>
    <span>Signal P Cleavage Site => 16:17</span>
    <br>
    <span class='signalp'>MKTGIIIFISTVVVLP</span>
    <span>KR</span>
  </p>
  <p>
    <span class='id'>>isotig00009_f6_13</span>
    <span>Signal P Cleavage Site => 11:12</span>
    <br>
    <span class='signalp'>MMHLLCIVLLL</span>
    <span>WL</span>
  </p>
  <p>
    <span class='id'>>isotig00009_f6_14</span>
    <span>Signal P Cleavage Site => 10:11</span>
    <br>
    <span class='signalp'>MHLLCIVLLL</span>
    <span>WL</span>
  </p>
</html>


来源:https://stackoverflow.com/questions/19564803/use-slim-haml-etc-in-a-ruby-script

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!