Weird Characters encoding

心不动则不痛 提交于 2019-12-13 05:26:08

问题


I have a weird behaviour in my params whichare passed as utf-8 but the special characters are not well managed. Instead of 1 special character, I have 2 characters: the normal letter + the accent.

Parameters: {"name"=>"Mylène.png", "_cardbiz_session"=>"be1d5b7a2f27c7c4979ac4c16fe8fc82", "authenticity_token"=>"9vmJ02DjgKYCpoBNUcWwUlpxDXA8ddcoALHXyT6wrnM=", "asset"=>{"file"=># < ActionDispatch::Http::UploadedFile:0x007f94d38d37d0 @original_filename="Mylène.png", @content_type="image/png", @headers="Content-Disposition: form-data; name=\"asset[file]\"; filename=\"Myle\xCC\x80ne.png\"\r\nContent-Type: image/png\r\n", @tempfile=# < File:/var/folders/q5/yvy_v9bn5wl_s5ccy_35qsmw0000gn/T/RackMultipart20130805-51100-1eh07dp > >}, "id"=>"copie-de-sm"}

I log this:

  • logger.debug file_name
  • logger.debug file_name.chars.map(&:to_s).inspect

Each time, same result:

  • Mylène
  • ["M", "y", "l", "e", "̀", "n", "e"]

As i try to use the filename as a matcher with already existing names properly encoded utf-8, you see my problem ;)

  • Encodings are utf-8 everywhere.
  • working under ruby 1.9.3 and rails 3.2.14.
  • Added #encoding: utf-8 in top of any file involved.

I anyone as an idea, take it !

I also published an Issue here : https://github.com/carrierwaveuploader/carrierwave/issues/1185 but not sure if its a carrierwave issue or me missing something...


回答1:


Seems to be linked to MACOSX.

https://www.ruby-forum.com/topic/4407424 explains it and refers to https://bugs.ruby-lang.org/issues/7267 for more details and discution.

MACOSX decomposing special characters into utf8-mac instead of utf-8...

While you can't know the encoding of a file name, just presupose it.

Thanks to our Linux guy where it works properly. ;)

file_name.encode!('utf-8', 'utf-8-mac').chars.map(&:to_s)



回答2:


Perhaps you have a Combining character and a problem with Unicode equivalence

When I check the codepoints with:

#encoding: utf-8
Parameters =  {"name"=>"Mylène.png",}

p Parameters['name'].codepoints.to_a

I get Myl\u00E8ne.png, but I think that's a conversion problem when I copy the text. It would be helpfull, if you can provide a file with the raw data.

I expect you have a combining grave accent and a e

The solution would be a Unicode normalization. (Sorry, I don't know how to do it with ruby. Perhaps somebody else has an answer for it).


You found your problem, so this is not needed any longer for you.

But in meantime I found a mechanism to normalize Unicode strings:

#encoding: utf-8
text = "Myl\u00E8ne.png" #"Mylène.png"
text2 = "Myle\u0300ne.png" #"Mylène.png"

puts text   #Mylène.png
puts text2  #Mylène.png

p text == text2 #false

#http://apidock.com/rails/ActiveSupport/Multibyte/Unicode/normalize
require 'active_support'
p text                                                                   #"Myl\u00E8ne.png"
p ActiveSupport::Multibyte::Unicode.normalize(text, :d) #"Myle\u0300ne.png"

p text2                                                                   #"Myle\u0300ne.png"
p ActiveSupport::Multibyte::Unicode.normalize(text2, :c)#"Myl\u00E8ne.png"

Maybe there is an easier way, but up to now I found none.



来源:https://stackoverflow.com/questions/18076986/weird-characters-encoding

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!