How to write BOM marker to a file in Ruby

后端 未结 2 1176
感情败类
感情败类 2021-01-01 23:15

I have some working code with a crutch to add BOM marker to a new file.

  #writing
  File.open name, \'w\', 0644 do |file|
    file.write \"\\uFEFF\"
    fil         


        
2条回答
  •  耶瑟儿~
    2021-01-01 23:59

    **** This answer lead to a new gem: file_with_bom ****

    I had the similar problem in the past and I extended File.open with additional encoding variants for the w-mode:

    class File
      BOM_LIST_hex = {
          Encoding::UTF_8      => "\xEF\xBB\xBF", #"\uEFBBBF"
          Encoding::UTF_16BE => "\xFE\xFF", #"\uFEFF",
          Encoding::UTF_16LE => "\xFF\xFE",
          Encoding::UTF_32BE => "\x00\x00\xFE\xFF",
          Encoding::UTF_32LE => "\xFE\xFF\x00\x00",
        }
      BOM_LIST_hex.freeze
      def utf_bom_hex(encoding = external_encoding)
        BOM_LIST_hex[encoding]
      end
    
    class << self
      alias :open_old :open
      def open(filename, mode_string = 'r', options = {}, &block)
        #check for bom-flag in mode_string
        options[:bom] = true if mode_string.sub!(/-bom/i,'')
    
        f = open_old(filename, mode_string, options)
        if options[:bom]
          case mode_string
            #r|bom already standard since 1.9.2
            when /\Ar/   #read mode -> remove BOM
              #remove BOM
              bom = f.read(f.utf_bom_hex.bytesize) 
              #check, if it was really a bom
              if bom != f.utf_bom_hex.force_encoding(bom.encoding)
                f.rewind  #return to position 0 if BOM was no BOM
              end
            when /\Aw/  #write mode -> attach BOM
              f = open_old(filename, mode_string, options)
              f << f.utf_bom_hex.force_encoding(f.external_encoding)
            end #mode_string
        end
    
        if block_given?
          yield f 
          f.close
        end
      end
      end
    end #File
    

    Testcode:

    EXAMPLE_TEXT = 'some content öäü'
    File.open("file_utf16le.txt", "w:utf-16le|bom"){|f| f << EXAMPLE_TEXT }
    File.open("file_utf16le.txt", "r:utf-16le|bom:utf-8"){|f| p f.read }
    File.open("file_utf16le.txt", "r:utf-16le:utf-8",  :bom => true ){|f| p f.read }
    File.open("file_utf16le.txt", "r:utf-16le:utf-8"){|f| p f.read }
    
    File.open("file_utf8.txt", "w:utf-8", :bom => true ){|f| f << EXAMPLE_TEXT }
    File.open("file_utf8.txt", "r:utf-8", :bom => true ){|f| p f.read }
    File.open("file_utf8.txt", "r:utf-8|bom",              ){|f| p f.read }
    File.open("file_utf8.txt", "r:utf-8",                     ){|f| p f.read }
    

    Some remarks:

    • The code is from pre 1.9-times (but it still works).
    • I used -bom as a bom indicator (ruby 1.9 uses |bom.

    Some needed fixes to be better:

    • use |bom instead -bom
    • use the standard r|bom for reading
    • make it ruby 1.8 and 1.9 enabled

    Perhaps I will find some time tomorrow to refactor my code and provide it as a gem.

提交回复
热议问题