Convert non-breaking spaces to spaces in Ruby

后端 未结 6 765
春和景丽
春和景丽 2020-12-15 03:56

I have cases where user-entered data from an html textarea or input is sometimes sent with \\u00a0 (non-breaking spaces) instead of spaces when encoded as utf-8

6条回答
  •  春和景丽
    2020-12-15 04:40

    Actual functioning IRB code examples that answer the question, with latest Rubies (May 2012)

    Ruby 1.9

    require 'rubygems'
    require 'nokogiri'
    RUBY_DESCRIPTION # => "ruby 1.9.3p194 (2012-04-20 revision 35410) [x86_64-linux]"
    doc = '   '
    page = Nokogiri::HTML(doc)
    s = page.inner_text
    s.each_codepoint {|c| print c, ' ' } #=> 32 160 32
    s.strip.each_codepoint {|c| print c, ' ' } #=> 160
    s.gsub(/\s+/,'').each_codepoint {|c| print c, ' ' } #=> 160
    s.gsub(/\u00A0/,'').strip.empty? #true
    

    Ruby 1.8

    require 'rubygems'
    require 'nokogiri'
    RUBY_DESCRIPTION # => "ruby 1.8.7 (2012-02-08 patchlevel 358) [x86_64-linux]"
    doc = '   '
    page = Nokogiri::HTML(doc)
    s = page.inner_text # " \302\240 "
    s.gsub(/\s+/,'') # "\302\240"
    s.gsub(/\302\240/,'').strip.empty? #true
    

提交回复
热议问题