How to check if letter is upper or lower in PHP?

后端 未结 13 1188
忘了有多久
忘了有多久 2020-12-04 19:12

I have texts in UTF-8 with diacritic characters also, and would like to check if first letter of this text is upper case or lower case. How to do this?

13条回答
  •  醉酒成梦
    2020-12-04 19:40

    It is my opinion that making a preg_ call is the most direct, concise, and reliable call versus the other posted solutions here.

    echo preg_match('~^\p{Lu}~u', $string) ? 'upper' : 'lower';
    

    My pattern breakdown:

    ~      # starting pattern delimiter 
    ^      #match from the start of the input string
    \p{Lu} #match exactly one uppercase letter (unicode safe)
    ~      #ending pattern delimiter 
    u      #enable unicode matching
    

    Please take notice when ctype_ and < 'a' fail with this battery of tests.

    Code: (Demo)

    $tests = ['âa', 'Bbbbb', 'Éé', 'iou', 'Δδ'];
    
    foreach ($tests as $test) {
        echo "\n{$test}:";
        echo "\n\tPREG:  " , preg_match('~^\p{Lu}~u', $test)      ? 'upper' : 'lower';
        echo "\n\tCTYPE: " , ctype_upper(mb_substr($test, 0, 1))  ? 'upper' : 'lower';
        echo "\n\t< a:   " , mb_substr($test, 0, 1) < 'a'         ? 'upper' : 'lower';
    
        $chr = mb_substr ($test, 0, 1, "UTF-8");
        echo "\n\tMB:    " , mb_strtoupper($chr, "UTF-8") == $chr ? 'upper' : 'lower';
    }
    

    Output:

    âa:
        PREG:  lower
        CTYPE: lower
        < a:   lower
        MB:    lower
    Bbbbb:
        PREG:  upper
        CTYPE: upper
        < a:   upper
        MB:    upper
    Éé:               <-- trouble
        PREG:  upper
        CTYPE: lower  <-- uh oh
        < a:   lower  <-- uh oh
        MB:    upper
    iou:
        PREG:  lower
        CTYPE: lower
        < a:   lower
        MB:    lower
    Δδ:               <-- extended beyond question scope
        PREG:  upper  <-- still holding up
        CTYPE: lower
        < a:   lower
        MB:    upper  <-- still holding up
    

    If anyone needs to differentiate between uppercase letters, lowercase letters, and non-letters see this post.


    It may be extending the scope of this question too far, but if your input characters are especially squirrelly (they might not exist in a category that Lu can handle), you may want to check if the first character has case variants:

    \p{L&} or \p{Cased_Letter}: a letter that exists in lowercase and uppercase variants (combination of Ll, Lu and Lt).

    • Source: https://www.regular-expressions.info/unicode.html

    To include Roman Numerals ("Number Letters") with SMALL variants, you can add that extra range to the pattern if necessary.

    https://www.fileformat.info/info/unicode/category/Nl/list.htm

    Code: (Demo)

    echo preg_match('~^[\p{Lu}\x{2160}-\x{216F}]~u', $test) ? 'upper' : 'not upper';
    

提交回复
热议问题