问题
Description
Given a dictionary, my program generates two output files, 'sequences.txt' and 'words.txt'.
- 'sequences' contain every sequence of four letters (A-z) that appear in exactly one word of the dictionary, one sequence per line.
- 'words' will contain the corresponding words that contain the sequences, in the same order, again one per line.
For example, given spec/fixtures/sample_words.txt
dictionary containing only
arrows
carrots
give
me
The outputs should be:
'sequences' 'words'
carr carrots
give give
rots carrots
rows arrows
rrot carrots
rrow arrows
Of course, 'arro' does not appear in the output since it is found in more than one word.
What I've come up with so far
Project structure:
├── Gemfile
├── Gemfile.lock
├── examples
│ └── dictionary.txt
├── lib
│ └── word_sequence_parser.rb
├── main.rb
├── output
├── readme.md
└── spec
├── fixtures
│ └── sample_words.txt
└── word_sequence_parser_spec.rb
To run the script:
ruby main.rb examples/dictionary.txt
main.rb
require_relative 'lib/word_sequence_parser.rb'
dict_path = ARGV.shift
if dict_path.nil?
dict_path = 'spec/fixtures/sample_words.txt'
end
parser = WordSequenceParser.new(dict_path)
# step 1 - Opens dictionary file and generates a new set of words
parser.set
# step 2 - Parses word sequences
parser.sequence
# step 3 - Prints to files in ./output
parser.dump_text
The script that works
word_sequence_parser.rb
require 'set'
class WordSequenceParser
def initialize(path)
@path = path
end
def set
set = Set.new
File.open(@path) do |f|
f.each_line do |line|
set.add(line.chomp.downcase)
end
end
set
end
def sequence
sequences = Set.new
words = Set.new
to_remove = Set.new
set.each do |w|
letters = w.split(//)
letters.each_cons(4) do |seq|
s = seq.join
if !words.add?(s)
to_remove.add(s)
end
sequences.add( {seq: s, word: w} )
end
end
sequences.delete_if { |hash| to_remove.include?(hash[:seq]) }
end
def dump_text
output_s = File.open( 'output/sequences.txt', 'w' )
output_w = File.open( 'output/words.txt', 'w' )
sequence.each do |hash|
output_s.puts("#{hash[:seq]}")
output_w.puts("#{hash[:word]}")
end
output_s.close
output_w.close
end
end
My shot at the script with memorization that doesn't work
require 'set'
class WordSequenceParser
def initialize(path)
@path = path
end
def set
set = Set.new
File.open(@path) do |f|
f.each_line do |line|
set.add(line.chomp.downcase)
end
end
set
end
def memoize
@set = set
end
def sequence
sequences = Set.new
words = Set.new
to_remove = Set.new
@set.each do |w|
letters = w.split(//)
letters.each_cons(4) do |seq|
s = seq.join
if !words.add?(s)
to_remove.add(s)
end
sequences.add( {seq: s, word: w} )
end
end
sequences.delete_if { |hash| to_remove.include?(hash[:seq]) }
end
def dump_text
output_s = File.open( 'output/sequences.txt', 'w' )
output_w = File.open( 'output/words.txt', 'w' )
sequence.each do |hash|
output_s.puts("#{hash[:seq]}")
output_w.puts("#{hash[:word]}")
end
output_s.close
output_w.close
end
end
I get this error message when trying to run the script.
../word_sequence_parser.rb:29:in `sequence': undefined method `each' for nil:NilClass (NoMethodError)
from main.rb:15:in `<main>'
I've read up on Justin Weiss' article on memoization and for the most part get it. Just having a hard time implementing this technique into something I've already written.
回答1:
It does not work since you never call memoize, so @set is never initialized.
However the real problem here, is that there is nothing to memoize.
Your original code looks pretty good, and if you think about how it works there is no redundant execution of any of the code. Every line that is executed either once, or if more than once, returns a different value.
Thus there is no purpose in memoization.
Lets say however you wanted to call dump_text (or just sequence) multiple times then you would definitely want to memoize sequence as follows:
def sequence
@sequence ||= begin
sequences = Set.new
words = Set.new
to_remove = Set.new
set.each do |w|
letters = w.split(//)
letters.each_cons(4) do |seq|
s = seq.join
if !words.add?(s)
to_remove.add(s)
end
sequences.add( {seq: s, word: w} )
end
end
sequences.delete_if { |hash| to_remove.include?(hash[:seq]) }
end
end
This will only execute your original sequence calculating code once, then assign @sequence. Every other call to @sequence will reuse the value of @sequence already calculated (because its now not nil.)
I love this question because this was the first thing I remember when my company started using ruby. We had a consultant redoing a lot of old asp.net code, and he had these @foo ||= ... expressions in methods, which I had never seen before.
来源:https://stackoverflow.com/questions/34697840/word-parser-script-and-implementing-memoization