Greek and text-transform:uppercase

问题

I've written a web application that contains translations in several languages (one of them being Greek.)

When displaying a certain translation on the title, the design rule was that the text is supposed to be uppercased, which in any other language in the world is fine, but when it comes to Greek, browsers don't know what to do with the accents (see this) so they display the wrong uppercased String.

From that patch I've linked above, I've transformed it to Javascript, ran some use cases against it, and it works. Now all I have to do is this:

Without adding a uppercase class to every element that needs to be uppercased (there are quite a few), can I query the DOM using a computed style property? Ie. give me all the elements that have a computed text-transform: uppercase

回答1:

I strongly suggest not using jQuery for this. Instead do this:

var e = document.getElementsByTagName('*'), l = e.length, i;
if( typeof getComputedStyle == "undefined")
    getComputedStyle = function(e) {return e.currentStyle;};
for( i=0; i<l; i++) {
    if( getComputedStyle(e[i]).textTransform == "uppercase") {
        // do stuff with e[i] here.
    }
}

Tested with 10,000 elements, of which 2,500 had "uppercase" text-transform.

jQuery processed in 595ms
JS processed in 60ms

So JavaScript is approximately 10 times faster at this than jQuery.

EDIT: Another test, this time with 100,000 elements:

jQuery failed.TypeError: Object doesn't support property or method 'each'
JS processed in 577ms

回答2:

The solution in this problem is described above example 3 here

This is an example that should work on any browser (tested only at firefox 25)

HTML:

<body>
  <p id="withlang" lang="el">κεφαλαία με μετατροπή σύμφωνα με την γλώσσα</p>
  <p id="withoutlang">κεφαλαία με μετατροπή σύμφωνα με αντιστοιχίσεις unicode</p>
  <p id="withlangsmall" lang="el">μικρά κεφαλαία με μετατροπή σύμφωνα με την γλώσσα</p>
  <p id="withoutlangsmall">μικρά κεφαλαία με μετατροπή σύμφωνα με αντιστοιχίσεις unicode</p>
</body>

CSS:

#withlang, #withoutlang{
  text-transform: uppercase;
}

#withlangsmall, #withoutlangsmall{
  font-variant: small-caps;
}

You can also use the lang attribute in higher level, for example at body tag.

HTML:

<body lang="el">
  <p id="withlang">κεφαλαία με μετατροπή σύμφωνα με την γλώσσα</p>
  <p id="withlangsmall">μικρά κεφαλαία με μετατροπή σύμφωνα με την γλώσσα</p>
</body>

CSS:

#withlang{
  text-transform: uppercase;
}

#withlangsmall{
  font-variant: small-caps;
}

回答3:

I am using this PHP function:

function toUpper($str){
        $search = array('Ά', 'Έ', 'Ί', 'Ή', 'Ύ', 'Ό', 'Ώ');
        $replace = array('Α', 'Ε', 'Ι', 'Η', 'Υ', 'Ο', 'Ω');
        $str = mb_strtoupper($str,  "UTF-8");
        return str_replace($search, $replace, $str);
    }

回答4:

OK, just for reference, here's my solution so far:

GREEK_CHARS = {
  LOWER_ALPHA                : 0x03B1
  LOWER_ALPHA_ACC            : 0x03AC
  LOWER_EPSILON              : 0x03B5
  LOWER_EPSILON_ACC          : 0x03AD
  LOWER_ETA                  : 0x03B7
  LOWER_ETA_ACC              : 0x03AE
  LOWER_IOTA                 : 0x03B9
  LOWER_IOTA_ACC             : 0x03AF
  LOWER_IOTA_ACC_DIAERESIS   : 0x0390
  LOWER_OMICRON              : 0x03BF
  LOWER_OMICRON_ACC          : 0x03CC
  LOWER_UPSILON              : 0x03C5
  LOWER_UPSILON_ACC          : 0x03CD
  LOWER_UPSILON_ACC_DIAERESIS: 0x03B0
  LOWER_OMEGA_ACC            : 0x03CE
  UPPER_ALPHA                : 0x0391
  UPPER_EPSILON              : 0x0395
  UPPER_ETA                  : 0x0397
  UPPER_IOTA                 : 0x0399
  UPPER_IOTA_DIAERESIS       : 0x03AA
  UPPER_OMICRON              : 0x039F
  UPPER_UPSILON              : 0x03A5
  UPPER_UPSILON_DIAERESIS    : 0x03AB
  UPPER_OMEGA                : 0x03A9
  UPPER_ALPHA_ACC            : 0x0386
  UPPER_EPSILON_ACC          : 0x0388
  UPPER_ETA_ACC              : 0x0389
  UPPER_IOTA_ACC             : 0x038A
  UPPER_OMICRON_ACC          : 0x038C
  UPPER_UPSILON_ACC          : 0x038E
  UPPER_OMEGA_ACC            : 0x038F
  COMBINING_ACUTE_ACCENT           : 0x0301
  COMBINING_DIAERESIS              : 0x0308
  COMBINING_ACUTE_TONE_MARK        : 0x0341
  COMBINING_GREEK_DIALYTIKA_TONOS  : 0x0344
}

String::toUpperCaseWithoutGreek = String::toUpperCase
String::toUpperCase = ->
  newStringCharCodes  = []
  insideTag           = false
  for char, idx in this
    insideTag = true if char == '<'
    insideTag = false if char == '>'
    charCode      = char.charCodeAt(0)

    if insideTag
      newStringCharCodes.push charCode
      continue

    prev          = if idx > 0 then newStringCharCodes[idx-1] else GREEK_CHARS.UPPER_ALPHA
    prevPrev      = if idx > 1 then newStringCharCodes[idx-2] else GREEK_CHARS.UPPER_ALPHA
    prevPrevPrev  = if idx > 2 then newStringCharCodes[idx-3] else GREEK_CHARS.UPPER_ALPHA

    switch charCode
      when GREEK_CHARS.LOWER_ALPHA_ACC, GREEK_CHARS.UPPER_ALPHA_ACC
        newStringCharCodes.push GREEK_CHARS.UPPER_ALPHA
      when GREEK_CHARS.LOWER_EPSILON_ACC, GREEK_CHARS.UPPER_EPSILON_ACC
        newStringCharCodes.push GREEK_CHARS.UPPER_EPSILON
      when GREEK_CHARS.LOWER_ETA_ACC, GREEK_CHARS.UPPER_ETA_ACC
        newStringCharCodes.push GREEK_CHARS.UPPER_ETA
      when GREEK_CHARS.LOWER_IOTA_ACC, GREEK_CHARS.UPPER_IOTA_ACC
        newStringCharCodes.push GREEK_CHARS.UPPER_IOTA
      when GREEK_CHARS.LOWER_IOTA_ACC_DIAERESIS
        newStringCharCodes.push GREEK_CHARS.UPPER_IOTA_DIAERESIS
      when GREEK_CHARS.LOWER_OMICRON_ACC, GREEK_CHARS.UPPER_OMICRON_ACC
        newStringCharCodes.push GREEK_CHARS.UPPER_OMICRON
      when GREEK_CHARS.LOWER_UPSILON_ACC, GREEK_CHARS.UPPER_UPSILON_ACC
        newStringCharCodes.push GREEK_CHARS.UPPER_UPSILON
      when GREEK_CHARS.LOWER_UPSILON_ACC_DIAERESIS
        newStringCharCodes.push GREEK_CHARS.UPPER_UPSILON_DIAERESIS
      when GREEK_CHARS.LOWER_OMEGA_ACC, GREEK_CHARS.UPPER_OMEGA_ACC
        newStringCharCodes.push GREEK_CHARS.UPPER_OMEGA

      when GREEK_CHARS.LOWER_IOTA
        switch prev
          when GREEK_CHARS.LOWER_ALPHA_ACC, GREEK_CHARS.LOWER_EPSILON_ACC, GREEK_CHARS.LOWER_OMICRON_ACC
            newStringCharCodes.push GREEK_CHARS.UPPER_IOTA_DIAERESIS
          when GREEK_CHARS.LOWER_UPSILON_ACC
            if prevPrev == GREEK_CHARS.LOWER_OMICRON
              newStringCharCodes.push GREEK_CHARS.UPPER_IOTA
            else
              newStringCharCodes.push GREEK_CHARS.UPPER_IOTA_DIAERESIS
          when GREEK_CHARS.COMBINING_ACUTE_ACCENT, GREEK_CHARS.COMBINING_ACUTE_TONE_MARK
            switch prevPrev
              when GREEK_CHARS.LOWER_ALPHA, GREEK_CHARS.LOWER_EPSILON, GREEK_CHARS.LOWER_OMICRON
                newStringCharCodes.push GREEK_CHARS.UPPER_IOTA_DIAERESIS
              when GREEK_CHARS.LOWER_UPSILON
                if prevPrevPrev == GREEK_CHARS.LOWER_OMICRON
                  newStringCharCodes.push GREEK_CHARS.UPPER_IOTA
                else
                  newStringCharCodes.push GREEK_CHARS.UPPER_IOTA_DIAERESIS
              else
                newStringCharCodes.push GREEK_CHARS.UPPER_IOTA
          else
            newStringCharCodes.push GREEK_CHARS.UPPER_IOTA

      when GREEK_CHARS.LOWER_UPSILON
        switch prev
          when GREEK_CHARS.LOWER_ALPHA_ACC, GREEK_CHARS.LOWER_EPSILON_ACC, GREEK_CHARS.LOWER_ETA_ACC, GREEK_CHARS.LOWER_OMICRON_ACC
            newStringCharCodes.push GREEK_CHARS.UPPER_UPSILON_DIAERESIS
          when GREEK_CHARS.COMBINING_ACUTE_ACCENT, GREEK_CHARS.COMBINING_ACUTE_TONE_MARK
            switch prevPrev
              when GREEK_CHARS.LOWER_ALPHA, GREEK_CHARS.LOWER_EPSILON, GREEK_CHARS.LOWER_ETA, GREEK_CHARS.LOWER_OMICRON
                newStringCharCodes.push GREEK_CHARS.UPPER_UPSILON_DIAERESIS
              else
                newStringCharCodes.push GREEK_CHARS.UPPER_UPSILON
          else
            newStringCharCodes.push GREEK_CHARS.UPPER_UPSILON

      when GREEK_CHARS.COMBINING_GREEK_DIALYTIKA_TONOS
        newStringCharCodes.push GREEK_CHARS.COMBINING_DIAERESIS
      when GREEK_CHARS.COMBINING_ACUTE_ACCENT, GREEK_CHARS.COMBINING_ACUTE_TONE_MARK
        if prev < GREEK_CHARS.LOWER_OMEGA_ACC && prev > GREEK_CHARS.UPPER_ALPHA_ACC
          newStringCharCodes.push null
      else
        newStringCharCodes.push(String.fromCharCode(charCode).toUpperCaseWithoutGreek().charCodeAt(0))

  String.fromCharCode.apply(null, newStringCharCodes)

This is a coffee script adaptation from the patch supplied in the bug above.

Here's what I do after a view is rendered:

# Fix greek uppercase.
[].concat($('*').get()).filter((elm) ->
  window.getComputedStyle(elm).getPropertyValue('text-transform') == "uppercase";
).forEach((elm) ->
  if elm.value
    elm.value = elm.value.toUpperCase()
  else
    $elm = $(elm)
    $elm.html($elm.html().toUpperCase())
)

This is not very nice, by any stretch of the imagination, but it works.

Two things I shouldn't be doing here, and might change: hijack toUpperCase() and have specific rules not to parse tags. Still open to better suggestions!

回答5:

This won't help with the Greek characters, but I was curious about finding all the elements with a given css property. I set this up: http://jsfiddle.net/pQfUv/1/

The bit that would interest you would be:

$('*').each(function() {
            if ($(this).css('text-transform') == 'uppercase') {
                //Do Stuff to the element
            }
        });

Looping through all the elements is probably a pretty expensive thing to do, though. Hope it helps.

Cheers, iso

回答6:

I can assure you that, not only Greek is affected. You are surely having problems with German Sharp S and Turkish letters i.

I am not really sure what was the purpose of using these transformations, but please keep in mind that many languages are written with the scripts which does not have a concept of Upper and Lower case characters. If you use this for emphasis, I suggest removing all transforms altogether and simply write part of the text with proper case. That way translators might decide on their way to stress the word or sentence.

BTW. Allowing span elements in the translations with a specific class might also be a good idea - that way somebody might use i.e. color for marking text differently (although it wouldn't really help color-blind people.)

回答7:

I like Otovo's answer as the most elegant and quick. I would certainly not recommend scanning all elements for text-transform. For large pages on mobile devices the inefficiency in speed is notable.

Therefore I would recommend to simply note down all selectors with text-transform from the CSS files. This should be possible for most cases. Then, use jQuery directly on those selectors.

So, to extend Otovo's answer, add a unique class like i18n-el per language somewhere like in body (this is the default for Drupal but anything similar would work). Then run:

$('.i18n-el').find('.all-relevant-selectors').attr('lang', 'el');

Obviouslt replace .all-relevant-selectors with the selectors that you noted down from the CSS files, separated with comma.

Also, it is worth mentioning that this works only for text-transform: uppercase and not font-variant: small-caps for Chrome 39.

Alternatively, there is a jQuery plugin for this matter called jquery-remove-upcase-accents, although I have not evaluated it at all.

来源：https://stackoverflow.com/questions/9434015/greek-and-text-transformuppercase

标签

javascript

css

internationalization