In what JS engines, specifically, are toLowerCase & toUpperCase locale-sensitive?

前端 未结 2 1871
迷失自我
迷失自我 2020-12-03 16:59

In the code of some libraries (e.g. AngularJS, the link leads to the specific lines in the code), I can see that custom case-conversion functions are used instead of the sta

2条回答
  •  臣服心动
    2020-12-03 17:43

    Note: Please, note that I couldn't test it!


    As per ECMAScript specification:

    String.prototype.toLowerCase ( )

    [...]

    For the purposes of this operation, the 16-bit code units of the Strings are treated as code points in the Unicode Basic Multilingual Plane. Surrogate code points are directly transferred from S to L without any mapping.

    The result must be derived according to the case mappings in the Unicode character database (this explicitly includes not only the UnicodeData.txt file, but also the SpecialCasings.txt file that accompanies it in Unicode 2.1.8 and later).

    [...]

    String.prototype.toLocaleLowerCase ( )

    This function works exactly the same as toLowerCase except that its result is intended to yield the correct result for the host environment’s current locale, rather than a locale-independent result. There will only be a difference in the few cases (such as Turkish) where the rules for that language conflict with the regular Unicode case mappings.

    [...]

    And as per Unicode Character Database Special Casing:

    [...]

    Format

    The entries in this file are in the following machine-readable format:

    ; ; ; <upper>; (<condition_list>;)? # <comment></code></p> </blockquote> <h3>Unconditional mappings</h3> <p>[...]</p> <blockquote> <p>Preserve canonical equivalence for I with dot. Turkic is handled below.</p> </blockquote> <p><code>0130; 0069 0307; 0130; 0130; # LATIN CAPITAL LETTER I WITH DOT ABOVE</code></p> <p>[...]</p> <blockquote> <p>Language-Sensitive Mappings These are characters whose full case mappings depend on language and perhaps also context (which characters come before or after). For more information see the header of this file and the Unicode Standard.</p> </blockquote> <h3>Lithuanian</h3> <blockquote> <p>Lithuanian retains the dot in a lowercase i when followed by accents.</p> <p>Remove DOT ABOVE after "i" with upper or titlecase</p> </blockquote> <p><code>0307; 0307; ; ; lt After_Soft_Dotted; # COMBINING DOT ABOVE</code></p> <blockquote> <p>Introduce an explicit dot above when lowercasing capital I's and J's whenever there are more accents above. (of the accents used in Lithuanian: grave, acute, tilde above, and ogonek)</p> </blockquote> <p><code>0049; 0069 0307; 0049; 0049; lt More_Above; # LATIN CAPITAL LETTER I</code></p> <p><code>004A; 006A 0307; 004A; 004A; lt More_Above; # LATIN CAPITAL LETTER J</code></p> <p><code>012E; 012F 0307; 012E; 012E; lt More_Above; # LATIN CAPITAL LETTER I WITH OGONEK</code></p> <p><code>00CC; 0069 0307 0300; 00CC; 00CC; lt; # LATIN CAPITAL LETTER I WITH GRAVE</code></p> <p><code>00CD; 0069 0307 0301; 00CD; 00CD; lt; # LATIN CAPITAL LETTER I WITH ACUTE</code></p> <p><code>0128; 0069 0307 0303; 0128; 0128; lt; #LATIN CAPITAL LETTER I WITH TILDE</code></p> <h3>Turkish and Azeri</h3> <blockquote> <p>I and i-dotless; I-dot and i are case pairs in Turkish and Azeri The following rules handle those cases.</p> </blockquote> <p><code>0130; 0069; 0130; 0130; tr; # LATIN CAPITAL LETTER I WITH DOT ABOVE</code></p> <p><code>0130; 0069; 0130; 0130; az; # LATIN CAPITAL LETTER I WITH DOT ABOVE</code></p> <blockquote> <p>When lowercasing, remove dot_above in the sequence I + dot_above, which will turn into i. This matches the behavior of the canonically equivalent I-dot_above</p> </blockquote> <p><code>0307; ; 0307; 0307; tr After_I; # COMBINING DOT ABOVE</code></p> <p><code>0307; ; 0307; 0307; az After_I; # COMBINING DOT ABOVE</code></p> <blockquote> <p>When lowercasing, unless an I is before a dot_above, it turns into a dotless i.</p> </blockquote> <p><code>0049; 0131; 0049; 0049; tr Not_Before_Dot; # LATIN CAPITAL LETTER I</code></p> <p><code>0049; 0131; 0049; 0049; az Not_Before_Dot; # LATIN CAPITAL LETTER I</code></p> <blockquote> <p>When uppercasing, i turns into a dotted capital I</p> </blockquote> <p><code>0069; 0069; 0130; 0130; tr; # LATIN SMALL LETTER I</code></p> <p><code>0069; 0069; 0130; 0130; az; # LATIN SMALL LETTER I</code></p> <blockquote> <p>Note: the following case is already in the UnicodeData.txt file.</p> <p><code>0131; 0131; 0049; 0049; tr; # LATIN SMALL LETTER DOTLESS I</code></p> <p><code>EOF</code></p> </blockquote> </blockquote> <p>Also, as per JavaScript for Absolute Beginners (by Terry McNavage):</p> <blockquote> <pre><code>> "I".toLowerCase() // "i" > "i".toUpperCase() // "I" > "I".toLocaleLowerCase() // "<dotless-i>" > "i".toLocaleUpperCase() // "<dotted-I>" </code></pre> <p><strong>Note</strong>: <code>toLocaleLowerCase()</code> and <code>toLocaleUpperCase()</code> convert case <strong>based on your OS settings</strong>. You'd have to change those settings to Turkish for the previous sample to work. Or just take my word for it!</p> </blockquote> <p>And as per bobince's comment over Convert JavaScript String to be all lower case? question:</p> <blockquote> <p><code>Accept-Language</code> and <code>navigator.language</code> are two completely separate settings. <code>Accept-Language</code> reflects the user's chosen preferences for what languages they want to receive in web pages (and this setting is unfortuately inaccessible to JS). <code>navigator.language</code> merely reflects which localisation of the web browser was installed, and should generally not be used for anything. Both of these values are unrelated to the system locale, which is the bit that decides what toLocaleLowerCase() will do; <strong>that's an OS-level setting</strong> out of scope of the browser's prefs.</p> </blockquote> <hr> <p>So, setting <code>lang="tr-TR"</code> to <code>html</code> won't reflect a real test case, since it's an OS setting that's required to reproduce the special casing example.</p> <p>I think that only lowercasing dotted-I or uppercasing dotless-i would be locale specific when using <code>toLowerCase()</code> or <code>toUpperCase()</code>.</p> <p>As per those credible/official sources, I think you're right: <code>'i' !== 'I'.toLowerCase()</code> would always evaluate to false.</p> <p>But, as I said, I couldn't test it here.</p> </p> <div class="appendcontent"> </div> </div> <div class="jieda-reply"> <span class="jieda-zan button_agree" type="zan" data-id='772424'> <i class="iconfont icon-zan"></i> <em>0</em> </span> <span type="reply" class="showpinglun" data-id="772424"> <i class="iconfont icon-svgmoban53"></i> 讨论(0) </span> <div class="jieda-admin"> </div> <div class="noreplaytext bb"> <center><div> <a href="https://www.e-learn.cn/qa/q-217970.html"> 查看其它2个回答 </a> </div></center> </div> </div> <div class="comments-mod " style="display: none; float:none;padding-top:10px;" id="comment_772424"> <div class="areabox clearfix"> <form class="layui-form" action=""> <div class="layui-form-item"> <label class="layui-form-label" style="padding-left:0px;width:60px;">发布评论:</label> <div class="layui-input-block" style="margin-left:90px;"> <input type="text" placeholder="不少于5个字" AUTOCOMPLETE="off" class="comment-input layui-input" name="content" /> <input type='hidden' value='0' name='replyauthor' /> </div> <div class="mar-t10"><span class="fr layui-btn layui-btn-sm addhuidapinglun" data-id="772424">提交评论 </span></div> </div> </form> </div> <hr> <ul class="my-comments-list nav"> <li class="loading"> <img src='https://www.e-learn.cn/qa/static/css/default/loading.gif' align='absmiddle' />  加载中... </li> </ul> </div> </li> </ul> <div class="layui-form layui-form-pane"> <form id="huidaform" name="answerForm" method="post"> <div class="layui-form-item layui-form-text"> <a name="comment"></a> <div class="layui-input-block"> <script type="text/javascript" src="https://www.e-learn.cn/qa/static/js/neweditor/ueditor.config.js"></script> <script type="text/javascript" src="https://www.e-learn.cn/qa/static/js/neweditor/ueditor.all.js"></script> <script type="text/plain" id="editor" name="content" style="width:100%;height:200px;"></script> <script type="text/javascript"> var isueditor=1; var editor = UE.getEditor('editor',{ //这里可以选择自己需要的工具按钮名称,此处仅选择如下五个 toolbars:[['source','fullscreen', '|', 'undo', 'redo', '|', 'bold', 'italic', 'underline', 'fontborder', 'strikethrough', 'removeformat', 'formatmatch', 'autotypeset', 'blockquote', 'pasteplain', '|', 'forecolor', 'backcolor', 'insertorderedlist', 'insertunorderedlist', 'selectall', 'cleardoc', '|', 'rowspacingtop', 'rowspacingbottom', 'lineheight', '|', 'customstyle', 'paragraph', 'fontfamily', 'fontsize', '|', 'indent', '|', 'justifyleft', 'justifycenter', 'justifyright', 'justifyjustify', '|', 'link', 'unlink', 'anchor', '|', 'simpleupload', 'insertimage', 'scrawl', 'insertvideo', 'attachment', 'map', 'insertcode', '|', 'horizontal', '|', 'preview', 'searchreplace', 'drafts']], initialContent:'', //关闭字数统计 wordCount:false, zIndex:2, //关闭elementPath elementPathEnabled:false, //默认的编辑区域高度 initialFrameHeight:250 //更多其他参数,请参考ueditor.config.js中的配置项 //更多其他参数,请参考ueditor.config.js中的配置项 }); editor.ready(function() { editor.setDisabled(); }); $("#editor").find("*").css("max-width","362px"); </script> </div> </div> <div class="layui-form-item"> <label for="L_vercode" class="layui-form-label">验证码</label> <div class="layui-input-inline"> <input type="text" id="code" name="code" value="" required lay-verify="required" placeholder="图片验证码" autocomplete="off" class="layui-input"> </div> <div class="layui-form-mid"> <span style="color: #c00;"><img class="hand" src="https://www.e-learn.cn/qa/user/code.html" onclick="javascript:updatecode();" id="verifycode"><a class="changecode" href="javascript:updatecode();"> 看不清?</a></span> </div> </div> <div class="layui-form-item"> <input type="hidden" value="217970" id="ans_qid" name="qid"> <input type="hidden" id="tokenkey" name="tokenkey" value=''/> <input type="hidden" value="In what JS engines, specifically, are toLowerCase & toUpperCase locale-sensitive?" id="ans_title" name="title"> <div class="layui-btn layui-btn-disabled" id="ajaxsubmitasnwer" >提交回复</div> </div> </form> </div> </div> <input type="hidden" value="217970" id="adopt_qid" name="qid" /> <input type="hidden" id="adopt_answer" value="0" name="aid" /> </div> <div class="layui-col-md4"> <!-- 热门讨论问题 --> <dl class="fly-panel fly-list-one"> <dt class="fly-panel-title">热议问题</dt> <!-- 本周热门讨论问题显示10条-->