Is there a memory-efficient replacement of java.lang.String?

后端未结
关注
 15  662
被撕碎了的回忆 2020-11-30 19:29
After reading this old article measuring the memory consumption of several object types, I was amazed to see how much memory Strings use in Java:

      
      
        
          15条回答        

        
                    
            
            
                         
                
              
              
                
                   孤独总比滥情好
                                             
                
                
                (楼主)
            
              
              
                2020-11-30 20:24
              

            
            
                        
Java chose UTF-16 for a compromise of speed and storage size. Processing UTF-8 data is much more PITA than processing UTF-16 data (e.g. when trying to find the position of character X in the byte array, how are you going to do so in a fast manner, if every character can have one, two, three or even up to six bytes? Ever thought about that? Going over the string byte by byte is not really fast, you see?). Of course UTF-32 would be easiest to process, but waste twice the storage space. Things have changed since the early Unicode days. Now certain characters need 4 byte, even when UTF-16 is used. Handling these correctly make UTF-16 almost equally bad as UTF-8.

Anyway, rest assured that if you implement a String class with an internal storage that uses UTF-8, you might win some memory, but you will lose processing speed for many string methods. Also your argument is a way too limited point of view. Your argument will not hold true for someone in Japan, since Japanese characters will not be smaller in UTF-8 than in UTF-16 (actually they will take 3 bytes in UTF-8, while they are only two bytes in UTF-16). I don't understand why programmers in such a global world like today with the omnipresent Internet still talk about "western languages", as if this is all that would count, as if only the western world has computers and the rest of it lives in caves. Sooner or later any application gets bitten by the fact that it fails to effectively process non-western characters.
    
             
                                                        
            
            
              
                
                0
              
                   
                
               讨论(0)
              
                                                  
              
              
                          
             
       
          
              
                                       
     查看其它15个回答


            
                         
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
                              			
        
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复