Using special characters in a string argument to the awk match function. Current locale settings

纵然是瞬间 提交于 2019-12-23 12:45:21

问题


I have a problem using the match function in awk on a string containing special characters. Consider the file test.awk:

{
    match($0,"(^.*)kon",a);
    print a[1];
}

and a corresponding test file "test.txt" with contents "Testing Håkon" (note the norwegian character "å"). The file is encoded in "iso-8859-1" with a length of 14 bytes. The hex dump of the file is given by xxd -p test.txt as

54657374696e672048e56b6f6e0a

From which we can see that the norwegian character "å" has been encoded with the hexadecimal number "e5".. That is, the file is encoded using iso-8859-1 encoding..

Running

awk  -f test.awk test.txt

Gives nothing at the terminal.. Whereas the correct output should have been "Testing Hå"..

The output of running the locale command is:

LANG=en_DK.UTF-8
LANGUAGE=en_US:
LC_CTYPE="en_DK.UTF-8"
LC_NUMERIC="en_DK.UTF-8"
LC_TIME="en_DK.UTF-8"
LC_COLLATE="en_DK.UTF-8"
LC_MONETARY="en_DK.UTF-8"
LC_MESSAGES="en_DK.UTF-8"
LC_PAPER="en_DK.UTF-8"
LC_NAME="en_DK.UTF-8"
LC_ADDRESS="en_DK.UTF-8"
LC_TELEPHONE="en_DK.UTF-8"
LC_MEASUREMENT="en_DK.UTF-8"
LC_IDENTIFICATION="en_DK.UTF-8"
LC_ALL=

which shows that the "LANG" variable is set to utf-8 encoding..


回答1:


This isn't a problem with awk see here. Your locale is expecting UTF-8 encoding but your file is using iso-8859-1 so either set your locale to match your file or vice versa.

Note: the second argument of match() should be a regexp and the trailing ; are not required

{
    match($0,/(^.*)kon/,a)
    print a[1]
}



回答2:


I've modified your code as:

{
    match($0,"(^.*)kon",a);
    print ">>>" a[1] "<<<";
}

The result running GNU Awk 3.1.6 under Windows 7:

>>>Hå<<<

Under Ubuntu running GNU Awk 3.1.8 I get:

>>><<<

To get the desired output, I had to temporarily change the locale settings and translate:

LC_ALL=ISO_8859-1 awk -f test.awk test.txt | iconv -f ISO_8859-1 -t UTF-8


来源:https://stackoverflow.com/questions/16760493/using-special-characters-in-a-string-argument-to-the-awk-match-function-current

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!