Encoding used in cast from char to byte

前端未结

关注

 3  553

Take a look at the following C# code (function extracted from the BuildProtectedURLWithValidity function in http://wmsauth.org/examples):

byte[]


                      
              相关标签:


      
      
        
          3条回答        

        
                         				            
            
           
            
                              
                
              
              
                
                  抹茶落季        
                
              
                            
                2020-12-18 02:19
              
            
            
                                                                       
Casting between byte and char is like using the ISO-8859-1 encoding (= the first 256 characters of Unicode), except it silently loses information when encoding characters beyond U+00FF.


  And besides, is the char actually bigger than a byte (I'm guessing 2 bytes) and will actually omit the first byte?


Yes.  A C# char = UTF-16 code unit = 2 bytes.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  鱼传尺愫        
                
              
                            
                2020-12-18 02:25
              
            
            
                                                                       
char represents a 16-bit UTF-16 code point. Casting a char to a byte results in the lower byte of the character, but both Douglas and dan04 are wrong in that it will always quietly discard the higher byte. If the higher byte is not zero the result depends on whether the compiler option Check for arithmetic overflow/underflow is set:

using System;
namespace CharTest
{
    class Program
    {
        public static void Main(string[] args)
        {   ByteToCharTest( 's' );
            ByteToCharTest( 'ы' );

            Console.ReadLine();
        }

        static void ByteToCharTest( char c )
        {   const string MsgTemplate =
                "Casting to byte character # {0}: {1}";

            string msgRes;
            byte   b;

            msgRes = "Success";
            try
            {   b = ( byte )c;  }
            catch( Exception e )
            {   msgRes = e.Message;  }

            Console.WriteLine(
                String.Format( MsgTemplate, (Int16)c, msgRes ) );
        }
    }
}


Output with overflow checking:

Casting to byte character # 115: Success
Casting to byte character # 1099: Arithmetic operation resulted in an overflow.


Output without overflow checking:

Casting to byte character # 115: Success        
Casting to byte character # 1099: Success

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  说谎        
                
              
                            
                2020-12-18 02:28
              
            
            
                                                                       
The .NET Framework uses Unicode to represent all its characters and strings. The integer value of a char (which you may obtain by casting to int) is equivalent to its UTF-16 code unit. For characters in the Basic Multilingual Plane (which constitute the majority of characters you'll ever encounter), this value is the Unicode code point.


  The .NET Framework uses the Char structure to represent a Unicode character. The Unicode Standard identifies each Unicode character with a unique 21-bit scalar number called a code point, and defines the UTF-16 encoding form that specifies how a code point is encoded into a sequence of one or more 16-bit values. Each 16-bit value ranges from hexadecimal 0x0000 through 0xFFFF and is stored in a Char structure. The value of a Char object is its 16-bit numeric (ordinal) value.  — Char Structure


Casting a char to byte will result in data loss for any character whose value is larger than 255. Try running the following simple example to understand why:

char c1 = 'D';        // code point 68
byte b1 = (byte)c1;   // b1 is 68

char c2 = 'ń';        // code point 324
byte b2 = (byte)c2;   // b2 is 68 too!
                      // 324 % 256 == 68


Yes, you should definitely use Encoding.UTF8.GetBytes instead.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
                             
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复