How can I convert text to title case?

早过忘川 提交于 2021-01-27 13:17:02

问题


I have a text file containing a list of titles that I need to change to title case (words should begin with a capital letter except for most articles, conjunctions, and prepositions).

For example, this list of book titles:

barbarians at the gate 
hot, flat, and crowded 
A DAY LATE AND A DOLLAR SHORT 
THE HITCHHIKER'S GUIDE TO THE GALAXY

should be changed to:

Barbarians at the Gate 
Hot, Flat, and Crowded 
A Day Late and a Dollar Short 
The Hitchhiker's Guide to the Galaxy

I wrote the following code:

while(<DATA>)
{
    $_=~s/(\s+)([a-z])/$1.uc($2)/eg;
    print $_;
}

But it capitalizes the first letter of every word, even words like "at," "the," and "a" in the middle of a title:

Barbarians At The Gate 
Hot, Flat, And Crowded 
A Day Late And A Dollar Short 
The Hitchhiker's Guide To The Galaxy

How can I do this?


回答1:


Thanks to See also Lingua::EN::Titlecase – Håkon Hægland given the way to get the output.

use Lingua::EN::Titlecase;
my $tc = Lingua::EN::Titlecase->new();

while(<DATA>)
{
    my $line = $_;
    my $tc = Lingua::EN::Titlecase->new($line);
    print $tc;
}



回答2:


You can also try using this regex: ^(.)(.*?)\b|\b(at|to|that|and|this|the|a|is|was)\b|\b(\w)([\w']*?(?:[^\w'-]|$)) and replace with \U$1\L$2\U$3\L$4. It works my matching the first letter of words that are not articles, capitalizing it, then matching the rest of the word. This seems to work in PHP, I don't know about Perl but it will likely work.

  • ^(.)(.*?)\b matches the first letter of the first word (group 1) and the rest of the word (group 2). This is done to prevent not capitalizing the first word because it's an article.
  • \b(word|multiple words|...)\b matches any connecting word to prevent capitalizing them.
  • (\w)([\w']*?(?:[^\w'-]|$)) matches the first letter of a word (group 3) and the rest of the word (group 4). Here I used [^\w'-] instead of \b so hyphens and apostrophes are counted as word characters too. This prevent 's from becoming 'S

The \U in replacement capitalizes the following characters and \L lowers them. If you want you can add more articles or words to the regex to prevent capitalizing them.

UPDATE: I changed the regex so you can include connecting phrases too (multiple words). But that will still make a very long regex...



来源:https://stackoverflow.com/questions/41059633/how-can-i-convert-text-to-title-case

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!