Does anyone have a good Proper Case algorithm

前端未结

关注

 13  1219

Does anyone have a trusted Proper Case or PCase algorithm (similar to a UCase or Upper)? I\'m looking for something that takes a value such as \"GEORGE BURDELL\"


                      
              相关标签:


      
      
        
          13条回答        

        
                         				            
            
           
            
                              
                
              
              
                
                  一生所求        
                
              
                            
                2020-12-15 05:00
              
            
            
                                                                       
I use this as the textchanged event handler of text boxes. Support entry of "McDonald"    

Public Shared Function DoProperCaseConvert(ByVal str As String, Optional ByVal allowCapital As Boolean = True) As String
    Dim strCon As String = ""
    Dim wordbreak As String = " ,.1234567890;/\-()#$%^&*€!~+=@"
    Dim nextShouldBeCapital As Boolean = True

    'Improve to recognize all caps input
    'If str.Equals(str.ToUpper) Then
    '    str = str.ToLower
    'End If

    For Each s As Char In str.ToCharArray

        If allowCapital Then
            strCon = strCon & If(nextShouldBeCapital, s.ToString.ToUpper, s)
        Else
            strCon = strCon & If(nextShouldBeCapital, s.ToString.ToUpper, s.ToLower)
        End If

        If wordbreak.Contains(s.ToString) Then
            nextShouldBeCapital = True
        Else
            nextShouldBeCapital = False
        End If
    Next

    Return strCon
End Function

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  长情又很酷        
                
              
                            
                2020-12-15 05:01
              
            
            
                                                                       
A lot of good answers here. Mine is pretty simple and only takes into account the names we have in our organization. You can expand it as you wish. This is not a perfect solution and will change vancouver to VanCouver, which is wrong. So tweak it if you use it.

Here was my solution in C#. This hard-codes the names into the program but with a little work you could keep a text file outside of the program and read in the name exceptions (i.e. Van, Mc, Mac) and loop through them. 

public static String toProperName(String name)
{
    if (name != null)
    {
        if (name.Length >= 2 && name.ToLower().Substring(0, 2) == "mc")  // Changes mcdonald to "McDonald"
            return "Mc" + Regex.Replace(name.ToLower().Substring(2), @"\b[a-z]", m => m.Value.ToUpper());

        if (name.Length >= 3 && name.ToLower().Substring(0, 3) == "van")  // Changes vanwinkle to "VanWinkle"
            return "Van" + Regex.Replace(name.ToLower().Substring(3), @"\b[a-z]", m => m.Value.ToUpper());

        return Regex.Replace(name.ToLower(), @"\b[a-z]", m => m.Value.ToUpper());  // Changes to title case but also fixes 
                                                                                   // appostrophes like O'HARE or o'hare to O'Hare
    }

    return "";
}

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  粉色の甜心        
                
              
                            
                2020-12-15 05:02
              
            
            
                                                                       
Unless I've misunderstood your question I don't think you need to roll your own, the TextInfo class can do it for you.

using System.Globalization;

CultureInfo.InvariantCulture.TextInfo.ToTitleCase("GeOrGE bUrdEll")


Will return "George Burdell. And you can use your own culture if there's some special rules involved.

Update: Michael (in a comment to this answer) pointed out that this will not work if the input is all caps since the method will assume that it is an acronym. The naive workaround for this is to .ToLower() the text before submitting it to ToTitleCase.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  不要未来只要你来        
                
              
                            
                2020-12-15 05:03
              
            
            
                                                                       
Kronoz, thank you. I found in your function that the line:

`if (!lowerWord.Contains(lowerPrefix)) return word`;


must say

if (!lowerWord.StartsWith(lowerPrefix)) return word;


so "información" is not changed to "InforMacIón"

best,

Enrique
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  青春惊慌失措        
                
              
                            
                2020-12-15 05:07
              
            
            
                                                                       
I did a quick C# port of https://github.com/tamtamchik/namecase, which is based on Lingua::EN::NameCase.

public static class CIQNameCase
{
    static Dictionary<string, string> _exceptions = new Dictionary<string, string>
        {
            {@"\bMacEdo"     ,"Macedo"},
            {@"\bMacEvicius" ,"Macevicius"},
            {@"\bMacHado"    ,"Machado"},
            {@"\bMacHar"     ,"Machar"},
            {@"\bMacHin"     ,"Machin"},
            {@"\bMacHlin"    ,"Machlin"},
            {@"\bMacIas"     ,"Macias"},
            {@"\bMacIulis"   ,"Maciulis"},
            {@"\bMacKie"     ,"Mackie"},
            {@"\bMacKle"     ,"Mackle"},
            {@"\bMacKlin"    ,"Macklin"},
            {@"\bMacKmin"    ,"Mackmin"},
            {@"\bMacQuarie"  ,"Macquarie"}
        };

    static Dictionary<string, string> _replacements = new Dictionary<string, string>
        {
            {@"\bAl(?=\s+\w)"         , @"al"},        // al Arabic or forename Al.
            {@"\b(Bin|Binti|Binte)\b" , @"bin"},       // bin, binti, binte Arabic
            {@"\bAp\b"                , @"ap"},        // ap Welsh.
            {@"\bBen(?=\s+\w)"        , @"ben"},       // ben Hebrew or forename Ben.
            {@"\bDell([ae])\b"        , @"dell$1"},    // della and delle Italian.
            {@"\bD([aeiou])\b"        , @"d$1"},       // da, de, di Italian; du French; do Brasil
            {@"\bD([ao]s)\b"          , @"d$1"},       // das, dos Brasileiros
            {@"\bDe([lrn])\b"         , @"de$1"},      // del Italian; der/den Dutch/Flemish.
            {@"\bEl\b"                , @"el"},        // el Greek or El Spanish.
            {@"\bLa\b"                , @"la"},        // la French or La Spanish.
            {@"\bL([eo])\b"           , @"l$1"},       // lo Italian; le French.
            {@"\bVan(?=\s+\w)"        , @"van"},       // van German or forename Van.
            {@"\bVon\b"               , @"von"}        // von Dutch/Flemish
        };

    static string[] _conjunctions = { "Y", "E", "I" };

    static string _romanRegex = @"\b((?:[Xx]{1,3}|[Xx][Ll]|[Ll][Xx]{0,3})?(?:[Ii]{1,3}|[Ii][VvXx]|[Vv][Ii]{0,3})?)\b";

    /// <summary>
    /// Case a name field into its appropriate case format 
    /// e.g. Smith, de la Cruz, Mary-Jane,  O'Brien, McTaggart
    /// </summary>
    /// <param name="nameString"></param>
    /// <returns></returns>
    public static string NameCase(string nameString)
    {
        // Capitalize
        nameString = Capitalize(nameString);
        nameString = UpdateIrish(nameString);

        // Fixes for "son (daughter) of" etc
        foreach (var replacement in _replacements.Keys)
        {
            if (Regex.IsMatch(nameString, replacement))
            {
                Regex rgx = new Regex(replacement);
                nameString = rgx.Replace(nameString, _replacements[replacement]);
            }                    
        }

        nameString = UpdateRoman(nameString);
        nameString = FixConjunction(nameString);

        return nameString;
    }

    /// <summary>
    /// Capitalize first letters.
    /// </summary>
    /// <param name="nameString"></param>
    /// <returns></returns>
    private static string Capitalize(string nameString)
    {
        nameString = nameString.ToLower();
        nameString = Regex.Replace(nameString, @"\b\w", x => x.ToString().ToUpper());
        nameString = Regex.Replace(nameString, @"'\w\b", x => x.ToString().ToLower()); // Lowercase 's
        return nameString;
    }

    /// <summary>
    /// Update for Irish names.
    /// </summary>
    /// <param name="nameString"></param>
    /// <returns></returns>
    private static string UpdateIrish(string nameString)
    {
        if(Regex.IsMatch(nameString, @".*?\bMac[A-Za-z^aciozj]{2,}\b") || Regex.IsMatch(nameString, @".*?\bMc"))
        {
            nameString = UpdateMac(nameString);
        }            
        return nameString;
    }

    /// <summary>
    /// Updates irish Mac & Mc.
    /// </summary>
    /// <param name="nameString"></param>
    /// <returns></returns>
    private static string UpdateMac(string nameString)
    {
        MatchCollection matches = Regex.Matches(nameString, @"\b(Ma?c)([A-Za-z]+)");
        if(matches.Count == 1 && matches[0].Groups.Count == 3)
        {
            string replacement = matches[0].Groups[1].Value;
            replacement += matches[0].Groups[2].Value.Substring(0, 1).ToUpper();
            replacement += matches[0].Groups[2].Value.Substring(1);
            nameString = nameString.Replace(matches[0].Groups[0].Value, replacement);

            // Now fix "Mac" exceptions
            foreach (var exception in _exceptions.Keys)
            {
                nameString = Regex.Replace(nameString, exception, _exceptions[exception]);
            }
        }
        return nameString;
    }

    /// <summary>
    /// Fix roman numeral names.
    /// </summary>
    /// <param name="nameString"></param>
    /// <returns></returns>
    private static string UpdateRoman(string nameString)
    {
        MatchCollection matches = Regex.Matches(nameString, _romanRegex);
        if (matches.Count > 1)
        {
            foreach(Match match in matches)
            {
                if(!string.IsNullOrEmpty(match.Value))
                {
                    nameString = Regex.Replace(nameString, match.Value, x => x.ToString().ToUpper());
                }
            }
        }
        return nameString;
    }

    /// <summary>
    /// Fix Spanish conjunctions.
    /// </summary>
    /// <param name=""></param>
    /// <returns></returns>
    private static string FixConjunction(string nameString)
    {            
        foreach (var conjunction in _conjunctions)
        {
            nameString = Regex.Replace(nameString, @"\b" + conjunction + @"\b", x => x.ToString().ToLower());
        }
        return nameString;
    }
}


Usage

string name_cased = CIQNameCase.NameCase("McCarthy");


This is my test method, everything seems to pass OK:

[TestMethod]
public void Test_NameCase_1()
{
    string[] names = {
        "Keith", "Yuri's", "Leigh-Williams", "McCarthy",
        // Mac exceptions
        "Machin", "Machlin", "Machar",
        "Mackle", "Macklin", "Mackie",
        "Macquarie", "Machado", "Macevicius",
        "Maciulis", "Macias", "MacMurdo",
        // General
        "O'Callaghan", "St. John", "von Streit",
        "van Dyke", "Van", "ap Llwyd Dafydd",
        "al Fahd", "Al",
        "el Grecco",
        "ben Gurion", "Ben",
        "da Vinci",
        "di Caprio", "du Pont", "de Legate",
        "del Crond", "der Sind", "van der Post", "van den Thillart",
        "von Trapp", "la Poisson", "le Figaro",
        "Mack Knife", "Dougal MacDonald",
        "Ruiz y Picasso", "Dato e Iradier", "Mas i Gavarró",
        // Roman numerals
        "Henry VIII", "Louis III", "Louis XIV",
        "Charles II", "Fred XLIX", "Yusof bin Ishak",
    };

    foreach(string name in names)
    {
        string name_upper = name.ToUpper();
        string name_cased = CIQNameCase.NameCase(name_upper);
        Console.WriteLine(string.Format("name: {0} -> {1}  -> {2}", name, name_upper, name_cased));
        Assert.IsTrue(name == name_cased);
    }

}

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  Happy的楠姐        
                
              
                            
                2020-12-15 05:08
              
            
            
                                                                       
There's also this neat Perl script for title-casing text.

http://daringfireball.net/2008/08/title_case_update


#!/usr/bin/perl

#     This filter changes all words to Title Caps, and attempts to be clever
# about *un*capitalizing small words like a/an/the in the input.
#
# The list of "small words" which are not capped comes from
# the New York Times Manual of Style, plus 'vs' and 'v'. 
#
# 10 May 2008
# Original version by John Gruber:
# http://daringfireball.net/2008/05/title_case
#
# 28 July 2008
# Re-written and much improved by Aristotle Pagaltzis:
# http://plasmasturm.org/code/titlecase/
#
#   Full change log at __END__.
#
# License: http://www.opensource.org/licenses/mit-license.php
#


use strict;
use warnings;
use utf8;
use open qw( :encoding(UTF-8) :std );


my @small_words = qw( (?<!q&)a an and as at(?!&t) but by en for if in of on or the to v[.]? via vs[.]? );
my $small_re = join '|', @small_words;

my $apos = qr/ (?: ['’] [[:lower:]]* )? /x;

while ( <> ) {
  s{\A\s+}{}, s{\s+\z}{};

  $_ = lc $_ if not /[[:lower:]]/;

  s{
      \b (_*) (?:
          ( (?<=[ ][/\\]) [[:alpha:]]+ [-_[:alpha:]/\\]+ |   # file path or
            [-_[:alpha:]]+ [@.:] [-_[:alpha:]@.:/]+ $apos )  # URL, domain, or email
          |
          ( (?i: $small_re ) $apos )                         # or small word (case-insensitive)
          |
          ( [[:alpha:]] [[:lower:]'’()\[\]{}]* $apos )       # or word w/o internal caps
          |
          ( [[:alpha:]] [[:alpha:]'’()\[\]{}]* $apos )       # or some other word
      ) (_*) \b
  }{
      $1 . (
        defined $2 ? $2         # preserve URL, domain, or email
      : defined $3 ? "\L$3"     # lowercase small word
      : defined $4 ? "\u\L$4"   # capitalize word w/o internal caps
      : $5                      # preserve other kinds of word
      ) . $6
  }xeg;


  # Exceptions for small words: capitalize at start and end of title
  s{
      (  \A [[:punct:]]*         # start of title...
      |  [:.;?!][ ]+             # or of subsentence...
      |  [ ]['"“‘(\[][ ]*     )  # or of inserted subphrase...
      ( $small_re ) \b           # ... followed by small word
  }{$1\u\L$2}xig;

  s{
      \b ( $small_re )      # small word...
      (?= [[:punct:]]* \Z   # ... at the end of the title...
      |   ['"’”)\]] [ ] )   # ... or of an inserted subphrase?
  }{\u\L$1}xig;

  # Exceptions for small words in hyphenated compound words
  ## e.g. "in-flight" -> In-Flight
  s{
      \b
      (?<! -)                 # Negative lookbehind for a hyphen; we don't want to match man-in-the-middle but do want (in-flight)
      ( $small_re )
      (?= -[[:alpha:]]+)      # lookahead for "-someword"
  }{\u\L$1}xig;

  ## # e.g. "Stand-in" -> "Stand-In" (Stand is already capped at this point)
  s{
      \b
      (?<!…)                  # Negative lookbehind for a hyphen; we don't want to match man-in-the-middle but do want (stand-in)
      ( [[:alpha:]]+- )       # $1 = first word and hyphen, should already be properly capped
      ( $small_re )           # ... followed by small word
      (?! - )                 # Negative lookahead for another '-'
  }{$1\u$2}xig;

  print "$_";
}

__END__



But it sounds like by proper case you mean.. for people's names only.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
   
          
     上一页
1
2
3
下一页
           
           
        
                                  
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复