Does anyone have a good Proper Case algorithm

前端 未结 13 1207
一生所求
一生所求 2020-12-15 04:03

Does anyone have a trusted Proper Case or PCase algorithm (similar to a UCase or Upper)? I\'m looking for something that takes a value such as \"GEORGE BURDELL\"

相关标签:
13条回答
  • 2020-12-15 05:00

    I use this as the textchanged event handler of text boxes. Support entry of "McDonald"

    Public Shared Function DoProperCaseConvert(ByVal str As String, Optional ByVal allowCapital As Boolean = True) As String
        Dim strCon As String = ""
        Dim wordbreak As String = " ,.1234567890;/\-()#$%^&*€!~+=@"
        Dim nextShouldBeCapital As Boolean = True
    
        'Improve to recognize all caps input
        'If str.Equals(str.ToUpper) Then
        '    str = str.ToLower
        'End If
    
        For Each s As Char In str.ToCharArray
    
            If allowCapital Then
                strCon = strCon & If(nextShouldBeCapital, s.ToString.ToUpper, s)
            Else
                strCon = strCon & If(nextShouldBeCapital, s.ToString.ToUpper, s.ToLower)
            End If
    
            If wordbreak.Contains(s.ToString) Then
                nextShouldBeCapital = True
            Else
                nextShouldBeCapital = False
            End If
        Next
    
        Return strCon
    End Function
    
    0 讨论(0)
  • 2020-12-15 05:01

    A lot of good answers here. Mine is pretty simple and only takes into account the names we have in our organization. You can expand it as you wish. This is not a perfect solution and will change vancouver to VanCouver, which is wrong. So tweak it if you use it.

    Here was my solution in C#. This hard-codes the names into the program but with a little work you could keep a text file outside of the program and read in the name exceptions (i.e. Van, Mc, Mac) and loop through them.

    public static String toProperName(String name)
    {
        if (name != null)
        {
            if (name.Length >= 2 && name.ToLower().Substring(0, 2) == "mc")  // Changes mcdonald to "McDonald"
                return "Mc" + Regex.Replace(name.ToLower().Substring(2), @"\b[a-z]", m => m.Value.ToUpper());
    
            if (name.Length >= 3 && name.ToLower().Substring(0, 3) == "van")  // Changes vanwinkle to "VanWinkle"
                return "Van" + Regex.Replace(name.ToLower().Substring(3), @"\b[a-z]", m => m.Value.ToUpper());
    
            return Regex.Replace(name.ToLower(), @"\b[a-z]", m => m.Value.ToUpper());  // Changes to title case but also fixes 
                                                                                       // appostrophes like O'HARE or o'hare to O'Hare
        }
    
        return "";
    }
    
    0 讨论(0)
  • 2020-12-15 05:02

    Unless I've misunderstood your question I don't think you need to roll your own, the TextInfo class can do it for you.

    using System.Globalization;
    
    CultureInfo.InvariantCulture.TextInfo.ToTitleCase("GeOrGE bUrdEll")
    

    Will return "George Burdell. And you can use your own culture if there's some special rules involved.

    Update: Michael (in a comment to this answer) pointed out that this will not work if the input is all caps since the method will assume that it is an acronym. The naive workaround for this is to .ToLower() the text before submitting it to ToTitleCase.

    0 讨论(0)
  • Kronoz, thank you. I found in your function that the line:

    `if (!lowerWord.Contains(lowerPrefix)) return word`;
    

    must say

    if (!lowerWord.StartsWith(lowerPrefix)) return word;
    

    so "información" is not changed to "InforMacIón"

    best,

    Enrique

    0 讨论(0)
  • 2020-12-15 05:07

    I did a quick C# port of https://github.com/tamtamchik/namecase, which is based on Lingua::EN::NameCase.

    public static class CIQNameCase
    {
        static Dictionary<string, string> _exceptions = new Dictionary<string, string>
            {
                {@"\bMacEdo"     ,"Macedo"},
                {@"\bMacEvicius" ,"Macevicius"},
                {@"\bMacHado"    ,"Machado"},
                {@"\bMacHar"     ,"Machar"},
                {@"\bMacHin"     ,"Machin"},
                {@"\bMacHlin"    ,"Machlin"},
                {@"\bMacIas"     ,"Macias"},
                {@"\bMacIulis"   ,"Maciulis"},
                {@"\bMacKie"     ,"Mackie"},
                {@"\bMacKle"     ,"Mackle"},
                {@"\bMacKlin"    ,"Macklin"},
                {@"\bMacKmin"    ,"Mackmin"},
                {@"\bMacQuarie"  ,"Macquarie"}
            };
    
        static Dictionary<string, string> _replacements = new Dictionary<string, string>
            {
                {@"\bAl(?=\s+\w)"         , @"al"},        // al Arabic or forename Al.
                {@"\b(Bin|Binti|Binte)\b" , @"bin"},       // bin, binti, binte Arabic
                {@"\bAp\b"                , @"ap"},        // ap Welsh.
                {@"\bBen(?=\s+\w)"        , @"ben"},       // ben Hebrew or forename Ben.
                {@"\bDell([ae])\b"        , @"dell$1"},    // della and delle Italian.
                {@"\bD([aeiou])\b"        , @"d$1"},       // da, de, di Italian; du French; do Brasil
                {@"\bD([ao]s)\b"          , @"d$1"},       // das, dos Brasileiros
                {@"\bDe([lrn])\b"         , @"de$1"},      // del Italian; der/den Dutch/Flemish.
                {@"\bEl\b"                , @"el"},        // el Greek or El Spanish.
                {@"\bLa\b"                , @"la"},        // la French or La Spanish.
                {@"\bL([eo])\b"           , @"l$1"},       // lo Italian; le French.
                {@"\bVan(?=\s+\w)"        , @"van"},       // van German or forename Van.
                {@"\bVon\b"               , @"von"}        // von Dutch/Flemish
            };
    
        static string[] _conjunctions = { "Y", "E", "I" };
    
        static string _romanRegex = @"\b((?:[Xx]{1,3}|[Xx][Ll]|[Ll][Xx]{0,3})?(?:[Ii]{1,3}|[Ii][VvXx]|[Vv][Ii]{0,3})?)\b";
    
        /// <summary>
        /// Case a name field into its appropriate case format 
        /// e.g. Smith, de la Cruz, Mary-Jane,  O'Brien, McTaggart
        /// </summary>
        /// <param name="nameString"></param>
        /// <returns></returns>
        public static string NameCase(string nameString)
        {
            // Capitalize
            nameString = Capitalize(nameString);
            nameString = UpdateIrish(nameString);
    
            // Fixes for "son (daughter) of" etc
            foreach (var replacement in _replacements.Keys)
            {
                if (Regex.IsMatch(nameString, replacement))
                {
                    Regex rgx = new Regex(replacement);
                    nameString = rgx.Replace(nameString, _replacements[replacement]);
                }                    
            }
    
            nameString = UpdateRoman(nameString);
            nameString = FixConjunction(nameString);
    
            return nameString;
        }
    
        /// <summary>
        /// Capitalize first letters.
        /// </summary>
        /// <param name="nameString"></param>
        /// <returns></returns>
        private static string Capitalize(string nameString)
        {
            nameString = nameString.ToLower();
            nameString = Regex.Replace(nameString, @"\b\w", x => x.ToString().ToUpper());
            nameString = Regex.Replace(nameString, @"'\w\b", x => x.ToString().ToLower()); // Lowercase 's
            return nameString;
        }
    
        /// <summary>
        /// Update for Irish names.
        /// </summary>
        /// <param name="nameString"></param>
        /// <returns></returns>
        private static string UpdateIrish(string nameString)
        {
            if(Regex.IsMatch(nameString, @".*?\bMac[A-Za-z^aciozj]{2,}\b") || Regex.IsMatch(nameString, @".*?\bMc"))
            {
                nameString = UpdateMac(nameString);
            }            
            return nameString;
        }
    
        /// <summary>
        /// Updates irish Mac & Mc.
        /// </summary>
        /// <param name="nameString"></param>
        /// <returns></returns>
        private static string UpdateMac(string nameString)
        {
            MatchCollection matches = Regex.Matches(nameString, @"\b(Ma?c)([A-Za-z]+)");
            if(matches.Count == 1 && matches[0].Groups.Count == 3)
            {
                string replacement = matches[0].Groups[1].Value;
                replacement += matches[0].Groups[2].Value.Substring(0, 1).ToUpper();
                replacement += matches[0].Groups[2].Value.Substring(1);
                nameString = nameString.Replace(matches[0].Groups[0].Value, replacement);
    
                // Now fix "Mac" exceptions
                foreach (var exception in _exceptions.Keys)
                {
                    nameString = Regex.Replace(nameString, exception, _exceptions[exception]);
                }
            }
            return nameString;
        }
    
        /// <summary>
        /// Fix roman numeral names.
        /// </summary>
        /// <param name="nameString"></param>
        /// <returns></returns>
        private static string UpdateRoman(string nameString)
        {
            MatchCollection matches = Regex.Matches(nameString, _romanRegex);
            if (matches.Count > 1)
            {
                foreach(Match match in matches)
                {
                    if(!string.IsNullOrEmpty(match.Value))
                    {
                        nameString = Regex.Replace(nameString, match.Value, x => x.ToString().ToUpper());
                    }
                }
            }
            return nameString;
        }
    
        /// <summary>
        /// Fix Spanish conjunctions.
        /// </summary>
        /// <param name=""></param>
        /// <returns></returns>
        private static string FixConjunction(string nameString)
        {            
            foreach (var conjunction in _conjunctions)
            {
                nameString = Regex.Replace(nameString, @"\b" + conjunction + @"\b", x => x.ToString().ToLower());
            }
            return nameString;
        }
    }
    

    Usage

    string name_cased = CIQNameCase.NameCase("McCarthy");
    

    This is my test method, everything seems to pass OK:

    [TestMethod]
    public void Test_NameCase_1()
    {
        string[] names = {
            "Keith", "Yuri's", "Leigh-Williams", "McCarthy",
            // Mac exceptions
            "Machin", "Machlin", "Machar",
            "Mackle", "Macklin", "Mackie",
            "Macquarie", "Machado", "Macevicius",
            "Maciulis", "Macias", "MacMurdo",
            // General
            "O'Callaghan", "St. John", "von Streit",
            "van Dyke", "Van", "ap Llwyd Dafydd",
            "al Fahd", "Al",
            "el Grecco",
            "ben Gurion", "Ben",
            "da Vinci",
            "di Caprio", "du Pont", "de Legate",
            "del Crond", "der Sind", "van der Post", "van den Thillart",
            "von Trapp", "la Poisson", "le Figaro",
            "Mack Knife", "Dougal MacDonald",
            "Ruiz y Picasso", "Dato e Iradier", "Mas i Gavarró",
            // Roman numerals
            "Henry VIII", "Louis III", "Louis XIV",
            "Charles II", "Fred XLIX", "Yusof bin Ishak",
        };
    
        foreach(string name in names)
        {
            string name_upper = name.ToUpper();
            string name_cased = CIQNameCase.NameCase(name_upper);
            Console.WriteLine(string.Format("name: {0} -> {1}  -> {2}", name, name_upper, name_cased));
            Assert.IsTrue(name == name_cased);
        }
    
    }
    
    0 讨论(0)
  • 2020-12-15 05:08

    There's also this neat Perl script for title-casing text.

    http://daringfireball.net/2008/08/title_case_update

    #!/usr/bin/perl
    
    #     This filter changes all words to Title Caps, and attempts to be clever
    # about *un*capitalizing small words like a/an/the in the input.
    #
    # The list of "small words" which are not capped comes from
    # the New York Times Manual of Style, plus 'vs' and 'v'. 
    #
    # 10 May 2008
    # Original version by John Gruber:
    # http://daringfireball.net/2008/05/title_case
    #
    # 28 July 2008
    # Re-written and much improved by Aristotle Pagaltzis:
    # http://plasmasturm.org/code/titlecase/
    #
    #   Full change log at __END__.
    #
    # License: http://www.opensource.org/licenses/mit-license.php
    #
    
    
    use strict;
    use warnings;
    use utf8;
    use open qw( :encoding(UTF-8) :std );
    
    
    my @small_words = qw( (?<!q&)a an and as at(?!&t) but by en for if in of on or the to v[.]? via vs[.]? );
    my $small_re = join '|', @small_words;
    
    my $apos = qr/ (?: ['’] [[:lower:]]* )? /x;
    
    while ( <> ) {
      s{\A\s+}{}, s{\s+\z}{};
    
      $_ = lc $_ if not /[[:lower:]]/;
    
      s{
          \b (_*) (?:
              ( (?<=[ ][/\\]) [[:alpha:]]+ [-_[:alpha:]/\\]+ |   # file path or
                [-_[:alpha:]]+ [@.:] [-_[:alpha:]@.:/]+ $apos )  # URL, domain, or email
              |
              ( (?i: $small_re ) $apos )                         # or small word (case-insensitive)
              |
              ( [[:alpha:]] [[:lower:]'’()\[\]{}]* $apos )       # or word w/o internal caps
              |
              ( [[:alpha:]] [[:alpha:]'’()\[\]{}]* $apos )       # or some other word
          ) (_*) \b
      }{
          $1 . (
            defined $2 ? $2         # preserve URL, domain, or email
          : defined $3 ? "\L$3"     # lowercase small word
          : defined $4 ? "\u\L$4"   # capitalize word w/o internal caps
          : $5                      # preserve other kinds of word
          ) . $6
      }xeg;
    
    
      # Exceptions for small words: capitalize at start and end of title
      s{
          (  \A [[:punct:]]*         # start of title...
          |  [:.;?!][ ]+             # or of subsentence...
          |  [ ]['"“‘(\[][ ]*     )  # or of inserted subphrase...
          ( $small_re ) \b           # ... followed by small word
      }{$1\u\L$2}xig;
    
      s{
          \b ( $small_re )      # small word...
          (?= [[:punct:]]* \Z   # ... at the end of the title...
          |   ['"’”)\]] [ ] )   # ... or of an inserted subphrase?
      }{\u\L$1}xig;
    
      # Exceptions for small words in hyphenated compound words
      ## e.g. "in-flight" -> In-Flight
      s{
          \b
          (?<! -)                 # Negative lookbehind for a hyphen; we don't want to match man-in-the-middle but do want (in-flight)
          ( $small_re )
          (?= -[[:alpha:]]+)      # lookahead for "-someword"
      }{\u\L$1}xig;
    
      ## # e.g. "Stand-in" -> "Stand-In" (Stand is already capped at this point)
      s{
          \b
          (?<!…)                  # Negative lookbehind for a hyphen; we don't want to match man-in-the-middle but do want (stand-in)
          ( [[:alpha:]]+- )       # $1 = first word and hyphen, should already be properly capped
          ( $small_re )           # ... followed by small word
          (?! - )                 # Negative lookahead for another '-'
      }{$1\u$2}xig;
    
      print "$_";
    }
    
    __END__
    

    But it sounds like by proper case you mean.. for people's names only.

    0 讨论(0)
提交回复
热议问题