问题
I'm using awk '{gsub(/^[ \t]+|[ \t]+$/,""); print;}' in.txt > out.txt to remove both leading and trailing whitespaces.
The problem is the output file actually has trailing whitespaces! All lines are of the same length - they are right padded with spaces.
What am I missing?
UPDATE 1
The problem is probably due to the the fact that the trailing spaces are nor "normal" spaces but \x20 characters (DC4).
UPDATE 2
I used gsub (/'[[:cntrl:]]|[[:space:]]|\x20/,"") an it worked.
Two strange things:
Why isn't \x20 considered a control character?
Using
'[[:cntrl:][:space:]\x20does NOT work. Why?
回答1:
This command works for me:
$ awk '{$1=$1}1' file.txt
回答2:
Your code is OK for me.
You may have something else than space and tabulation...hexdump -C may help you to check what is wrong:
awk '{gsub(/^[ \t]+|[ \t]+$/,""); print;}' in.txt | hexdump -C | less
UPDATE:
OK you identified DC4 (there may be some other control characters...)
Then, you can improve your command:
awk '{gsub(/^[[:cntrl:][:space:]]+|[[:cntrl:][:space:]]+$/,""); print;}' in.txt > out.txt
See awk manpage:
[:alnum:] Alphanumeric characters.[:alpha:] Alphabetic characters.[:blank:] Space or tab characters.[:cntrl:] Control characters.[:digit:] Numeric characters.[:graph:] Characters that are both printable and visible. (A space is printable, but not visible, while an a is both.)[:lower:] Lower-case alphabetic characters.[:print:] Printable characters (characters that are not control characters.)[:punct:] Punctuation characters (characters that are not letter, digits, control characters, or space characters).[:space:] Space characters (such as space, tab, and formfeed, to name a few).[:upper:] Upper-case alphabetic characters.[:xdigit:] Characters that are hexadecimal digits.
Leading/Trailing 0x20 removal
For me the command is OK, I have tested like this:
$ echo -e "\x20 \tTEXT\x20 \t" | hexdump -C
00000000 20 20 09 54 45 58 54 20 20 09 0a | .TEXT ..|
0000000b
$ echo -e "\x20 \tTEXT\x20 \t" | awk '{gsub(/^[[:cntrl:][:space:]]+|[[:cntrl:][:space:]]+$/,""); print;}' | hexdump -C
00000000 54 45 58 54 0a |TEXT.|
00000005
However if you have 0x20 in the middle of your text
=> then it is not removed.
But this is not your question, isn't it?
回答3:
Your files probably have Windows line endings. That means that they end with \r\n, so matching a sequence of tabs and spaces at the end of the line won't work -- awk tries to match all the tabs and spaces that come after the \r. Try running the file through tr -d "\r" before sending it to awk.
回答4:
Perl could be used:
perl -lpe 's/^\s*(.*\S)\s*$/$1/' in.txt > out.txt
s/foo/bar/ substitute using regular expressions^ beginning of string\s* zero or more spaces(.*\S) any characters ending with a non-whitespace. Capture it into $1\s* zero or more spaces$ end of string
来源:https://stackoverflow.com/questions/9175801/how-to-remove-leading-and-trailing-whitespaces