Changing the case of a string with awk

一曲冷凌霜 提交于 2020-01-22 18:33:27

问题


I'm an awk newbie, so please bear with me.

The goal is to change the case of a string such that the first letter of every word is uppercase and the remaining letters are lowercase. (To keep the example simple, "word" is defined here as strictly alphabetic characters; all others are considered separators.)

I learned a nice way to make the first letter of every word uppercase from another post on this website using the following awk command:

echo 'abce efgh ijkl mnop' | awk '{for (i=1;i <= NF;i++) {sub(".",substr(toupper($i),1,1),$i)} print}' --> Abcd Efgh Ijkl Mnop

Making the remaining letters lowercase is easily accomplished by preceding the awk command with a tr command:

echo 'aBcD EfGh ijkl MNOP' | tr [A-Z] [a-z] | awk '{for (i=1;i <= NF;i++) {sub(".",substr(toupper($i),1,1),$i)} print}' --> Abcd Efgh Ijkl Mnop

However, in the interest of learning more about awk, I wanted to change the case of all but the first letter to lowercase with a similar awk construct. I used the regular expression \B[A-Za-z]+ to match all letters of a word but the first, and the awk command substr(tolower($i),2) to provide those same letters in lowercase, as follows:

echo 'ABCD EFGH IJKL MNOP' | awk '{for (i=1;i <= NF;i++) {sub("\B[A-Za-z]+",substr(tolower($i),2),$i)} print}' --> Abcd EFGH IJKL MNOP

Notice that the first word converted properly, but the remaining words are left unchanged. I would be very grateful for an explanation of why the remaining words did not convert properly and how to get them to do so.


回答1:


The issue is that \B (zero-width non-word boundary) only seems to match at the beginning of the line, so $1 works but $2 and following fields do not match the regex, so they are not substituted and remain uppercase. Not sure why \B doesn't match except for the first field... B should match anywhere within any word:

echo 'ABCD EFGH IJKL MNOP' | awk '{for (i=1; i<=NF; ++i) { print match($i, /\B/); }}'
2   # \B matches ABCD at 2nd character as expected
0   # no match for EFGH
0   # no match for IJKL
0   # no match for MNOP

Anyway to achieve your result (capitalize only the first character of the line), you can operate on $0 (the whole line) instead of using a for loop:

echo 'ABCD EFGH IJKL MNOP' | awk '{print toupper(substr($0,1,1)) tolower(substr($0,2)) }'

Or if you still wanted to capitalize each word separately but with awk only:

awk '{for (i=1; i<=NF; ++i) { $i=toupper(substr($i,1,1)) tolower(substr($i,2)); } print }'



回答2:


When matching regex using the sub() function or others (like gsub() etc), it's best used in the following form:

sub(/regex/, replacement, target)

This is different from what you have:

sub("regex", replacement, target)

So your command becomes:

awk '{ for (i=1;i<=NF;i++) sub(/\B\w+/, substr(tolower($i),2), $i) }1'

Results:

Abcd Efgh Ijkl Mnop

This article on String Functions maybe worth a read. HTH.


I should say that there are easier ways to accomplish what you want, for example using GNU sed:

sed -r 's/\B\w+/\L&/g'



回答3:


My solution will be to get the first part of the sub with a first substr insted of your regex :

echo 'ABCD EFGH IJKL MNOP' | awk '{for (i=1 ; i <= NF ; i++) {sub(substr($i,2),tolower(substr($i,2)),$i)} print }'
Abcd Efgh Ijkl Mnop



回答4:


You have to add another \ character before \B

 echo 'ABCD EFGH IJKL MNOP' | awk '{for (i=1;i <= NF;i++)
 {sub("\\B[A-Za-z]+",substr(tolower($i),2),$i)} print}'

With just \B awk gave me this warning:

awk: cmd. line:1: warning: escape sequence \B' treated as plainB'



来源:https://stackoverflow.com/questions/14139672/changing-the-case-of-a-string-with-awk

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!