Heuristics for splitting full names

谁说胖子不能爱 提交于 2020-01-05 05:52:16

问题


Splitting a full name into first and last names is an unsolvable problem because names are really, really complicated. As a result, my model, which represents authors and other contributors to a book, includes both name and filingName fields, where filingName should usually be "Last, First" (for Western names).

However, as a convenience for my users, I'd like to have my app make a reasonable guess at the filing name when the user fills in the regular name. The user can edit the filing name if the guess is wrong, of course, but if I guess right, I'll have saved them some time. Currently I'm simply assuming the last space-separated "word" is the last name and moving it to the front with a comma:

NSMutableArray * parts = [self.name componentsSeparatedByCharactersInSet:NSCharacterSet.whitespaceCharacterSet].mutableCopy;

if(parts.count < 2) {
    return self.name;
}

NSString * lastName = parts.lastObject;
[parts removeLastObject];

return [NSString stringWithFormat:@"%@, %@", lastName, [parts componentsJoinedByString:@" "]];

I can immediately think of one case where this will lead me astray: suffixes like "Jr". But I'm sure there are many others. Are there any good resources explaining common naming caveats, or good examples of code tackling this problem, that I can use to improve my heuristic? I'm using Objective-C on the Mac (in case there's some obscure corner of a framework that could help me), but I'm willing to learn from code written in any language.

This sort of question has been asked before, but most answers either focus on the mechanics of splitting apart a string, or devolve into "design your model differently". I am designing my model differently; I'm just looking to let the computer do most of my users' work for them.

As I said earlier, this code is mainly handling the names of authors and other contributors to books. Some of the specific ramifications of that include:

  • There should only be one name in name, because I support attaching multiple authors to a book.
  • Most names will not have titles, but professional titles like "Dr." could show up. Ideally these would be discarded, not treated as part of the first name.
  • The names will usually be of people, but could sometimes be of organizations. I'm perfectly willing to risk mangling organization names to get better person name handling.
  • I expect I will mostly be handling European names, although detecting the orthography of the name should not be difficult.
  • The code should not be particularly sensitive to the user's locale.

回答1:


When you build a software system, there are always serious problems that consume a lot of time. I wouldn't get stucked with this because there is no worldwide naming conventions nor rules. I don't think asking the user to enter his/her filing name will be a bother, for they'll do it just once.

That seems to be the easier solution IMHO.



来源:https://stackoverflow.com/questions/15597474/heuristics-for-splitting-full-names

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!