Java Strings Character Encoding - For French - Dutch Locales

后端未结

关注

 3  2038

执念已碎 2021-01-13 16:26

I have the following piece of code

public static void main(String[] args) throws UnsupportedEncodingException {
        System.out.println(Charset.defaultCha


      
      
        
          3条回答        

        
                    
            
            
                         
                
              
              
                
                   盖世英雄少女心
                                             
                
                
                (楼主)
            
              
              
                2021-01-13 17:25
              

            
            
                        
Line #1 - the default character set on your system is windows-1252.

Line #2 - you created a String by encoding a String literal to UTF-8 bytes, and then decoding it using the UTF-8 scheme.  The result is correctly formed String, which can be output correctly using windows-1252 encoding.

Line #3 - you created a String by encoding a string literal as windows-1252, and then decoding it using UTF-8.  The UTF-8 decoder has detected a sequence that cannot possibly be UTF-8, and has replaced the offending character with a question mark"?".  (The UTF-8 format says that any byte that has the top bit set to 1 is one byte of a multi-byte character.  But the windows-1252 encoding is just one byte long .... ergo, this is bad UTF-8)

Line #4 - you created a String by encoding in UTF-8 and then decoding in windows-1252.  In this case the decoding has not "failed", but it has produced garbage (aka mojibake).  The reason you got 2 characters of output is that the UTF-8 encoding of "é" is a 2 byte sequence.

Line #5 - you created a String by encoding as windows-1252 and decoding as windows-1252.  This produce the correct output.



And the overall lesson is that if you encode characters to bytes with one character encoding, and then decode with a different character encoding you are liable to get mangling of one form or another.
    
             
                                                        
            
            
              
                
                0
              
                   
                
               讨论(0)
              
                                                  
              
              
                          
             
       
          
              
                                       
     查看其它3个回答


            
                         
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
                              			
        
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复