Does anyone have a good Proper Case algorithm

前端未结

关注

 13  1218

Does anyone have a trusted Proper Case or PCase algorithm (similar to a UCase or Upper)? I\'m looking for something that takes a value such as \"GEORGE BURDELL\"


                      
              相关标签:


      
      
        
          13条回答        

        
                         				            
            
           
            
                              
                
              
              
                
                  伪装坚强ぢ        
                
              
                            
                2020-12-15 04:46
              
            
            
                                                                       
You do not mention which language you would like the solution in so here is some pseudo code.

Loop through each character
    If the previous character was an alphabet letter
        Make the character lower case
    Otherwise
        Make the character upper case
End loop

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  日久生厌        
                
              
                            
                2020-12-15 04:50
              
            
            
                                                                       
a simple way to capitalise the first letter of each word (seperated by a space)

$words = explode(” “, $string);
for ($i=0; $i<count($words); $i++) {
$s = strtolower($words[$i]);
$s = substr_replace($s, strtoupper(substr($s, 0, 1)), 0, 1);
$result .= “$s “;
}
$string = trim($result);


in terms of catching the "O'REILLY" example you gave
splitting the string on both spaces and ' would not work as it would capitalise any letter that appeared after a apostraphe i.e. the s in Fred's

so i would probably try something like 

$words = explode(” “, $string);
for ($i=0; $i<count($words); $i++) {

$s = strtolower($words[$i]);

if (substr($s, 0, 2) === "o'"){
$s = substr_replace($s, strtoupper(substr($s, 0, 3)), 0, 3);
}else{
$s = substr_replace($s, strtoupper(substr($s, 0, 1)), 0, 1);
}
$result .= “$s “;
}
$string = trim($result);


This should catch O'Reilly, O'Clock, O'Donnell etc hope it helps

Please note this code is untested.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  日久生厌        
                
              
                            
                2020-12-15 04:53
              
            
            
                                                                       
What programming language do you use? Many languages allow callback functions for regular expression matches. These can be used to propercase the match easily. The regular expression that would be used is quite simple, you just have to match all word characters, like so:

/\w+/


Alternatively, you can already extract the first character to be an extra match:

/(\w)(\w*)/


Now you can access the first character and successive characters in the match separately. The callback function can then simply return a concatenation of the hits. In pseudo Python (I don't actually know Python):

def make_proper(match):
    return match[1].to_upper + match[2]


Incidentally, this would also handle the case of “O'Reilly” because “O” and “Reilly” would be matched separately and both propercased. There are however other special cases that are not handled well by the algorithm, e.g. “McDonald's” or generally any apostrophed word. The algorithm would produce “Mcdonald'S” for the latter. A special handling for apostrophe could be implemented but that would interfere with the first case. Finding a thereotical perfect solution isn't possible. In practice, it might help considering the length of the part after the apostrophe.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  隐瞒了意图╮        
                
              
                            
                2020-12-15 04:58
              
            
            
                                                                       
I know this thread has been open for awhile, but as I was doing research for this problem I came across this nifty site, which allows you to paste in names to be capitalized quite quickly: https://dialect.ca/code/name-case/. I wanted to include it here for reference for others doing similar research/projects.

They release the algorithm they have written in php at this link: https://dialect.ca/code/name-case/name_case.phps

A preliminary test and reading of their code suggests they have been quite thorough.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  闹比i        
                
              
                            
                2020-12-15 04:59
              
            
            
                                                                       
@Zack: I'll post it as a separate reply. 

Here's an example based on kronoz's post. 

void Main()
{
    List<string> names = new List<string>() {
        "bill o'reilly", 
        "johannes diderik van der waals", 
        "mr. moseley-williams", 
        "Joe VanWyck", 
        "mcdonald's", 
        "william the third", 
        "hrh prince charles", 
        "h.r.m. queen elizabeth the third",
        "william gates, iii", 
        "pope leo xii",
        "a.k. jennings"
    };

    names.Select(name => name.ToProperCase()).Dump();
}

// http://stackoverflow.com/questions/32149/does-anyone-have-a-good-proper-case-algorithm
public static class ProperCaseHelper
{
    public static string ToProperCase(this string input)
    {
        if (IsAllUpperOrAllLower(input))
        {
            // fix the ALL UPPERCASE or all lowercase names
            return string.Join(" ", input.Split(' ').Select(word => wordToProperCase(word)));
        }
        else
        {
            // leave the CamelCase or Propercase names alone
            return input;
        }
    }

    public static bool IsAllUpperOrAllLower(this string input)
    {
        return (input.ToLower().Equals(input) || input.ToUpper().Equals(input));
    }

    private static string wordToProperCase(string word)
    {
        if (string.IsNullOrEmpty(word)) return word;

        // Standard case
        string ret = capitaliseFirstLetter(word);

        // Special cases:
        ret = properSuffix(ret, "'");   // D'Artagnon, D'Silva
        ret = properSuffix(ret, ".");   // ???
        ret = properSuffix(ret, "-");       // Oscar-Meyer-Weiner
        ret = properSuffix(ret, "Mc", t => t.Length > 4);      // Scots
        ret = properSuffix(ret, "Mac", t => t.Length > 5);     // Scots except Macey

        // Special words:
        ret = specialWords(ret, "van");     // Dick van Dyke
        ret = specialWords(ret, "von");     // Baron von Bruin-Valt
        ret = specialWords(ret, "de");
        ret = specialWords(ret, "di");
        ret = specialWords(ret, "da");      // Leonardo da Vinci, Eduardo da Silva
        ret = specialWords(ret, "of");      // The Grand Old Duke of York
        ret = specialWords(ret, "the");     // William the Conqueror
        ret = specialWords(ret, "HRH");     // His/Her Royal Highness
        ret = specialWords(ret, "HRM");     // His/Her Royal Majesty
        ret = specialWords(ret, "H.R.H.");  // His/Her Royal Highness
        ret = specialWords(ret, "H.R.M.");  // His/Her Royal Majesty

        ret = dealWithRomanNumerals(ret);   // William Gates, III

        return ret;
    }

    private static string properSuffix(string word, string prefix, Func<string, bool> condition = null)
    {
        if (string.IsNullOrEmpty(word)) return word;
        if (condition != null && ! condition(word)) return word;

        string lowerWord = word.ToLower();
        string lowerPrefix = prefix.ToLower();

        if (!lowerWord.Contains(lowerPrefix)) return word;

        int index = lowerWord.IndexOf(lowerPrefix);

        // If the search string is at the end of the word ignore.
        if (index + prefix.Length == word.Length) return word;

        return word.Substring(0, index) + prefix +
            capitaliseFirstLetter(word.Substring(index + prefix.Length));
    }

    private static string specialWords(string word, string specialWord)
    {
        if (word.Equals(specialWord, StringComparison.InvariantCultureIgnoreCase))
        {
            return specialWord;
        }
        else
        {
            return word;
        }
    }

    private static string dealWithRomanNumerals(string word)
    {
        // Roman Numeral parser thanks to [Hannobo](https://stackoverflow.com/users/785111/hannobo)
        // Note that it excludes the Chinese last name Xi
        return new Regex(@"\b(?!Xi\b)(X|XX|XXX|XL|L|LX|LXX|LXXX|XC|C)?(I|II|III|IV|V|VI|VII|VIII|IX)?\b", RegexOptions.IgnoreCase).Replace(word, match => match.Value.ToUpperInvariant());
    }

    private static string capitaliseFirstLetter(string word)
    {
        return char.ToUpper(word[0]) + word.Substring(1).ToLower();
    }

}

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  我寻月下人不归        
                
              
                            
                2020-12-15 05:00
              
            
            
                                                                       
Here's a perhaps naive C# implementation:-

public class ProperCaseHelper {
  public string ToProperCase(string input) {
    string ret = string.Empty;

    var words = input.Split(' ');

    for (int i = 0; i < words.Length; ++i) {
      ret += wordToProperCase(words[i]);
      if (i < words.Length - 1) ret += " ";
    }

    return ret;
  }

  private string wordToProperCase(string word) {
    if (string.IsNullOrEmpty(word)) return word;

    // Standard case
    string ret = capitaliseFirstLetter(word);

    // Special cases:
    ret = properSuffix(ret, "'");
    ret = properSuffix(ret, ".");
    ret = properSuffix(ret, "Mc");
    ret = properSuffix(ret, "Mac");

    return ret;
  }

  private string properSuffix(string word, string prefix) {
    if(string.IsNullOrEmpty(word)) return word;

    string lowerWord = word.ToLower(), lowerPrefix = prefix.ToLower();
    if (!lowerWord.Contains(lowerPrefix)) return word;

    int index = lowerWord.IndexOf(lowerPrefix);

    // If the search string is at the end of the word ignore.
    if (index + prefix.Length == word.Length) return word;

    return word.Substring(0, index) + prefix +
      capitaliseFirstLetter(word.Substring(index + prefix.Length));
  }

  private string capitaliseFirstLetter(string word) {
    return char.ToUpper(word[0]) + word.Substring(1).ToLower();
  }
}

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
   
          
     1
2
3
下一页
           
           
        
                                  
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复