Does Unicode have a defined maximum number of code points?

前端 未结 3 1218
死守一世寂寞
死守一世寂寞 2020-12-15 07:31

I have read many articles in order to know what is the maximum number of the Unicode code points, but I did not find a final answer.

I understood that the Unicode co

3条回答
  •  不思量自难忘°
    2020-12-15 08:16

    I have made a very little routine that prints onscreen a very long table, from 0 to n values where the var start is a number that can be customizable by the user. This is the snippet:

    function getVal()
    			{
    				var start = parseInt(document.getElementById('start').value);
    				var range = parseInt(document.getElementById('range').value);
    				var end = start + range;
    				return [start, range, end];
    			}
    
    		
    			function next()
    			{
    				var values = getVal();
    				document.getElementById('start').value = values[2];
    				document.getElementById('ok').click();
    			}
    			
    			function prev()
    			{
    				var values = getVal();
    				document.getElementById('start').value = values[0] - values[1];
    				document.getElementById('ok').click();
    			}
    			
    			function renderCharCodeTable()
    			{
    				var values = getVal();
    				var start = values[0];
    				var end = values[2];
    
    				const MINSTART = 0; // Allowed range
    				const MAXEND = 4294967294; // Allowed range
    				
    				start = start < MINSTART ? MINSTART : start;
    				end = end < MINSTART ? (MINSTART + 1) : end;
    
    				start = start > MAXEND ? (MAXEND - 1) : start;
    				end = end >= MAXEND ? (MAXEND + 1) : end;
    				
    				var tr = [];
    				
    				var unicodeCharSet = document.getElementById('unicodeCharSet');
    
    				var cCode;
    				var cPoint;
    				for (var c = start; c < end; c++)
    				{
    					try
    					{
    						cCode = String.fromCharCode(c);
    					}
    					catch (e)
    					{
    						cCode = 'fromCharCode max val exceeded';
    					}
    					
    					try
    					{
    						cPoint = String.fromCodePoint(c);
    					}
    					catch (e)
    					{
    						cPoint = 'fromCodePoint max val exceeded';
    					}
    					
    					tr[c] = '' + c + '' + cCode + '' + cPoint + ''
    				}
    				unicodeCharSet.innerHTML = tr.join('');
    			}
    			
    			function startRender()
    			{
    				setTimeout(renderCharCodeTable, 100);
    				console.time('renderCharCodeTable');
    			}
    			unicodeCharSet.addEventListener("load",startRender());
    body
    		{
    			margin-bottom: 50%;
    		}
    		
    		form
    		{
    			position: fixed;
    		}
    		
    		table *
    		{
    			border: 1px solid black;
    			font-size: 1em;
    			text-align: center;
    		}
    		
    		table
    		{
    			margin: auto;
    			border-collapse: collapse;
    		}
    		
    		td:hover
    		{
    			padding-bottom: 1.5em;
    			padding-top: 1.5em;
    		}
    		
    		tbody > tr:hover
    		{
    			font-size: 5em;
    		}
    	
    	
    Start Unicode:

    Show symbols at once.

    CODE Symbol fromCharCode Symbol fromCodePoint
    Rendering...

    Run it a first time, then open the code and set the start variable's value to a very high number just a little bit lower than MAXEND constant value. The following is what I obtained:

        code        equivalent symbol
    {~~~ first execution output example ~~~~~}
    
    0   
    1   
    2   
    3   
    4   
    5   
    6   
    7   
    8   
    9   
    10  
    11  
    12  
    13  
    14  
    15  
    16  
    17  
    18  
    19  
    20  
    21  
    22  
    23  
    24  
    25  
    26  
    27  
    28  
    29  
    30  
    31  
    32  
    33  !
    34  "
    35  #
    36  $
    37  %
    38  &
    39  '
    40  (
    41  )
    42  *
    43  +
    44  ,
    45  -
    46  .
    47  /
    48  0
    49  1
    50  2
    51  3
    52  4
    53  5
    54  6
    55  7
    56  8
    57  9
    {~~~ second execution output example ~~~~~}
    4294967275  →
    4294967276  ↓
    4294967277  ■
    4294967278  ○
    4294967279  ￯
    4294967280  ￰
    4294967281  ￱
    4294967282  ￲
    4294967283  ￳
    4294967284  ￴
    4294967285  ￵
    4294967286  ￶
    4294967287  ￷
    4294967288  ￸
    4294967289  
    4294967290  
    4294967291  
    4294967292  
    4294967293  �
    4294967294  
    

    The output of course is truncated (between the first and the second execution) cause it is too long.

    After the 4294967294 (= 2^32) the function inexorably stops so I suppose that it has reached its max possible value: so I interpret this as the max possible value of the unicode char code table. Of course as said by other answers, not all char code have an equivalent symbols but frequently they are empty, as the example showed. Also there are a lot of symbols that are repeated multiple time in different points between 0 to 4294967294 char codes

    Edit: improvements

    (thanks @duskwuff)

    Now it is also possible to compare both String.fromCharCode and String.fromCodePoint behaviors. Notice that the first statement arrives to 4294967294 but the output is repeated every 65536 (16 bit = 2^16). The last one stops working at code 1114111 (cause the list of unicode char and symbols starts from 0 we have a total of 1,114,112 Unicode code points but as said in other answers not all of them are valid in the sense that they are empty points). Also remember that to use a certain unicode char you need to have an appropriate font that has the corresponding char defined in it. If not you will show an empty unicode char or an empty square char.

    Notice:

    I have noticed that in some Android systems using Chrome Browser for Android the js String.fromCodePoint returns an error for all codepoints.

提交回复
热议问题