问题
I have a file which is quite big. I need to mask all characters in specific postions and from a specific record type. I have searched all over the place but cannot find a solution of this quite simple task. Here is an example
File name: hello.txt
File:
0120140206INPUT FILE
1032682842 MR SIMPSON
20231458 742 Evergreen Terrace
3034560817 GREEN
1032682842 MR GRIFFIN
20231458 Spooner Street
3034560817 RED
3001
What I would like to do is to mask position 12-16 of all lines beginnning with "10". Like this:
0120140206INPUT FILE
1032682842 XXXXXMPSON
20231458 742 Evergreen Terrace
3034560817 GREEN
1032682842 XXXXXIFFIN
20231458 Spooner Street
3034560817 RED
3001
回答1:
Using sed
sed -r '/^10/ s/^(.{11}).{5}/\1XXXXX/' file
0120140206INPUT FILE
1032682842 XXXXXMPSON
20231458 742 Evergreen Terrace
3034560817 GREEN
1032682842 XXXXXIFFIN
20231458 Spooner Street
3034560817 RED
3001
Explanation
-r
useful option in sed, --regexp-extended/^10/
Search the line beginning with 10.s/^(.{11}).{5}/\1XXXXX/
mask position 12-16 to XXXXX
With same idea, if your awk is gawk, and support gensub()
function:
awk '{$0=gensub(/^(10.{9}).{5}/,"\\1XXXXX",$0)}1' file
update: @tripleee provide a shorter one:
sed -r 's/^(10.{9}).{5}/\1XXXXX/' file
回答2:
This can be a way:
$ awk 'BEGIN{FS=OFS=""} $1$2=="10" {for(i=12;i<=16;i++) $i="X"}1' file
0120140206INPUT FILE
1032682842 XXXXXMPSON
20231458 742 Evergreen Terrace
3034560817 GREEN
1032682842 XXXXXIFFIN
20231458 Spooner Street
3034560817 RED
3001
Explanation
BEGIN{FS=OFS=""}
set field separator as "", so that first char will be first field, 2nd char will be 2nd field...$1$2=="10" {for(i=12;i<=16;i++) $i="X"}
if the first char is1
and the second0
, then change from the 12th to the 16th characters toX
.1
true condition, which is evaluated as the default awk behaviour:{print $0}
.
回答3:
This awk can work:
awk '/^10/{q=substr($0, 12, 4); gsub(/./, "*", q); $0=substr($0, 1, 11) q substr($0, 17)}1' file
回答4:
This should do:
awk '/^10/{q=substr($0,1,11);r=substr($0,17); $0=q "XXXXX" r }1' file
0120140206INPUT FILE
1032682842 XXXXXMPSON
20231458 742 Evergreen Terrace
3034560817 GREEN
1032682842 XXXXXIFFIN
20231458 Spooner Street
3034560817 RED
3001
回答5:
This might work for you (GNU sed):
sed -r '/^10/{s/^(.{0,11})(.{0,5})/\1\n\2\n/;h;s/[^\n]/X/g;G;s/.*\n(.*)\n.*\n(.*)\n.*\n/\2\1/}' file
For lines beginning with 10
: place two markers either side of the intended mask, copy, replace all characters other than the markers with the mask character, append the copy and manipulate the text between the markers to position the mask.
N.B. This caters for short lines and does not introduce artefacts.
回答6:
You can use gawk fixed-width data reading capability:
gawk -v FIELDWIDTHS="11 5 9999" -v OFS="" '/^10/ { $2 = "XXXXX" } ; { print }' file
See https://www.gnu.org/software/gawk/manual/gawk.html#Constant-Size.
回答7:
You can use BASH:
while read f1 f2; do
if [[ $f1 =~ ^10 ]]; then
f2="XXXXX${f2:5}"
fi
echo $f1 $f2
done < hello.txt
This will work if you only need to replace the first 5 characters of the second field with XXXXX
.
If you need to replace the 12th through the 16th characters with XXXXX
regardless of field, you could do the longer:
while read l; do
if [[ $l =~ ^10 ]]; then
b=${l:11}
e=${l:16}
t=${b/$e/}
l=${l/$t/XXXXX}
fi
echo $l
done < hello.txt
回答8:
the perl alternative
perl -p -i -e 's/^(10\d* )[A-Z ]{6}(.*)/$1XXXXXX$2/g' filename.txt
来源:https://stackoverflow.com/questions/21624098/bash-one-liner-to-mask-data-in-file