Parser in Ruby: #slice! inside #each_with_index = missing element

问题

Let's say, I want to separate certain combinations of elements from an array. For example

data = %w{ start before rgb 255 255 255 between hex FFFFFF after end }
rgb, hex = [], []
data.each_with_index do |v,i|
  p [i,v]
  case v.downcase
    when 'rgb' then rgb  = data.slice! i,4
    when 'hex' then hex  = data.slice! i,2
  end
end
pp [rgb, hex, data]
# >> [0, "start"]
# >> [1, "before"]
# >> [2, "rgb"]
# >> [3, "hex"]
# >> [4, "end"]
# >> [["rgb", "255", "255", "255"],
# >>  ["hex", "FFFFFF"],
# >>  ["start", "before", "between", "after", "end"]]

The code have done the correct extraction, but it missed the elements just after the extracted sets. So if my data array is

data = %w{ start before rgb 255 255 255 hex FFFFFF after end }

then

pp [rgb, hex, data]
# >> [["rgb", "255", "255", "255"],
# >>  [],
# >>  ["start", "before", "hex", "FFFFFF", "after", "end"]]

Why does it happen? How to get those missed elements inside #each_with_index? Or may be there is a better solution for this problem assuming that there are much more sets to extract?

回答1:

The problem is that you are mutating the collection while you are iterating over it. This cannot possibly work. (And in my opinion, it shouldn't. Ruby should raise an exception in this case, instead of silently allowing incorrect behavior. That's what pretty much all other imperative languages do.)

This here is the best I could come up with while still keeping your original style:

require 'pp'

data = %w[start before rgb 255 255 255 hex FFFFFF after end]

rgb_count = hex_count = 0

rgb, hex, rest = data.reduce([[], [], []]) do |acc, el|
  acc.tap do |rgb, hex, rest|
    next (rgb_count = 3  ; rgb << el) if /rgb/i =~ el
    next (rgb_count -= 1 ; rgb << el) if rgb_count > 0
    next (hex_count = 1  ; hex << el) if /hex/i =~ el
    next (hex_count -= 1 ; hex << el) if hex_count > 0
    rest << el
  end
end

data.replace(rest)

pp rgb, hex, data
# ["rgb", "255", "255", "255"]
# ["hex", "FFFFFF"]
# ["start", "before", "after", "end"]

However, what you have is a parsing problem and that should really be solved by a parser. A simple hand-rolled parser/state machine will probably be a little bit more code than the above, but it will be so much more readable.

Here's a simple recursive-descent parser that solves your problem:

class ColorParser
  def initialize(input)
    @input = input.dup
    @rgb, @hex, @data = [], [], []
  end

  def parse
    parse_element until @input.empty?
    return @rgb, @hex, @data
  end

  private

  def parse_element
    parse_color or parse_stop_word
  end

  def parse_color
    parse_rgb or parse_hex
  end

  def parse_rgb
    return unless /rgb/i =~ peek
    @rgb << consume
    parse_rgb_values
  end

I really like recursive-descent parsers because their structure almost perfectly matches the grammar: just keep parsing elements until the input is empty. What is an element? Well, it's a color specification or a stop word. What is a color specification? Well, it's either an RGB color specification or a hex color specification. What is an RGB color specification? Well, it's something that matches the Regexp /rgb/i followed by RGB values. What are RGB values? Well, it's just three numbers …

  def parse_rgb_values
    3.times do @rgb << consume.to_i end
  end

  def parse_hex
    return unless /hex/i =~ peek
    @hex << consume
    parse_hex_value
  end

  def parse_hex_value
    @hex << consume.to_i(16)
  end

  def parse_stop_word
    @data << consume unless /rgb|hex/i =~ peek
  end

  def consume
    @input.slice!(0)
  end

  def peek
    @input.first
  end
end

Use it like so:

data = %w[start before rgb 255 255 255 hex FFFFFF after end]
rgb, hex, rest = ColorParser.new(data).parse

require 'pp'

pp rgb, hex, rest
# ["rgb", 255, 255, 255]
# ["hex", 16777215]
# ["start", "before", "after", "end"]

For comparison, here's the grammar:

S → element*
element → color | word
color → rgb | hex
rgb → rgb rgbvalues
rgbvalues → token token token
hex → hex hexvalue
hexvalue → token
word → token

回答2:

Because you are manipulating data in place.

When you hit rgb the next element in the loop would be 255, but you are deleting those elements so now between is in the place that rgb was, so the next element is hex

Something like this may work better for you:

when 'rgb' then rgb  = data.slice! i+1,3
when 'hex' then hex  = data.slice! i+1,1

回答3:

Here is a bit nicer solution

data = %w{ start before rgb 255 255 255 hex FFFFFF hex EEEEEE after end }
rest, rgb, hex = [], [], []
until data.empty?
  case (key = data.shift).downcase
    when 'rgb' then rgb  += [key] + data.shift(3)
    when 'hex' then hex  += [key] + data.shift(1)
    else rest << key
  end
end
p rgb, hex, rest
# >> ["rgb", "255", "255", "255"]
# >> ["hex", "FFFFFF", "hex", "EEEEEE"]
# >> ["start", "before", "after", "end"]

来源：https://stackoverflow.com/questions/3343726/parser-in-ruby-slice-inside-each-with-index-missing-element

标签

ruby

parsing

each

slice