What is the idea behind ^= 32, that converts lowercase letters to upper and vice versa?

前端 未结 10 1795
孤城傲影
孤城傲影 2020-11-28 03:02

I was solving some problem on codeforces. Normally I first check if the character is upper or lower English letter then subtract or add 32 to convert it to the

10条回答
  •  孤城傲影
    2020-11-28 03:38

    Allow me to say that this is -- although it seems smart -- a really, really stupid hack. If someone recommends this to you in 2019, hit him. Hit him as hard as you can.
    You can, of course, do it in your own software that you and nobody else uses if you know that you will never use any language but English anyway. Otherwise, no go.

    The hack was arguable "OK" some 30-35 years ago when computers didn't really do much but English in ASCII, and maybe one or two major European languages. But... no longer so.

    The hack works because US-Latin upper- and lowercases are exactly 0x20 apart from each other and appear in the same order, which is just one bit of difference. Which, in fact, this bit hack, toggles.

    Now, the people creating code pages for Western Europe, and later the Unicode consortium, were smart enough to keep this scheme for e.g. German Umlauts and French-accented Vowels. Not so for ß which (until someone convinced the Unicode consortium in 2017, and a large Fake News print magazine wrote about it, actually convincing the Duden -- no comment on that) don't even exist as a versal (transforms to SS). Now it does exist as versal, but the two are 0x1DBF positions apart, not 0x20.

    The implementors were, however, not considerate enough to keep this going. For example, if you apply your hack in some East European languages or the like (I wouldn't know about Cyrillic), you will get a nasty surprise. All those "hatchet" characters are examples of that, lowercase and uppercase are one apart. The hack thus does not work properly there.

    There's much more to consider, for example, some characters do not simply transform from lower- to uppercase at all (they're replaced with different sequences), or they may change form (requiring different code points).

    Do not even think about what this hack will do to stuff like Thai or Chinese (it'll just give you complete nonsense).

    Saving a couple of hundred CPU cycles may have been very worthwhile 30 years ago, but nowadays, there is really no excuse for converting a string properly. There are library functions for performing this non-trivial task.
    The time taken to convert several dozens kilobytes of text properly is negligible nowadays.

提交回复
热议问题