Force strings to UTF-8 from any encoding

前端 未结 4 1358
不知归路
不知归路 2020-12-13 13:36

In my rails app I\'m working with RSS feeds from all around the world, and some feeds have links that are not in UTF-8. The original feed links are out of my control, and i

相关标签:
4条回答
  • 2020-12-13 13:49

    Only this solution worked for me:

    string.encode('UTF-8', 'binary', invalid: :replace, undef: :replace, replace: '')
    

    Note the binary argument.

    0 讨论(0)
  • 2020-12-13 13:56

    This will ensure you have the correct encoding and won't error out because it replaces any invalid or undefined character with a blank string.

    This will ensure no matter what, that you have a valid UTF-8 string

    str.encode(Encoding.find('UTF-8'), {invalid: :replace, undef: :replace, replace: ''})
    
    0 讨论(0)
  • 2020-12-13 14:03

    Iconv

    require 'iconv'
    i = Iconv.new('UTF-8','LATIN1')
    a_with_hat = i.iconv("\xc2")
    

    Summary: the iconv gem does all the work of converting encodings. Make sure it's installed with:

    gem install iconv
    

    Now, you need to know what encoding your string is currently in as Ruby 1.8 treats Strings as an array of bytes (with no intrinsic encoding.) For example, say your string was in latin1 and you wanted to convert it to utf-8

    require 'iconv'
    
    string_in_utf8_encoding = Iconv.conv("UTF8", "LATIN1", string_in_latin1_encoding)
    
    0 讨论(0)
  • 2020-12-13 14:10

    Ruby 1.9

    "Forcing" an encoding is easy, however it won't convert the characters just change the encoding:

    str = str.force_encoding('UTF-8')
    
    str.encoding.name # => 'UTF-8'
    

    If you want to perform a conversion, use encode:

    begin
      str.encode("UTF-8")
    rescue Encoding::UndefinedConversionError
      # ...
    end
    

    I would definitely read the following post for more information:
    http://graysoftinc.com/character-encodings/ruby-19s-string

    0 讨论(0)
提交回复
热议问题