Bash one-liner to mask data in file

问题

I have a file which is quite big. I need to mask all characters in specific postions and from a specific record type. I have searched all over the place but cannot find a solution of this quite simple task. Here is an example

File name: hello.txt

File:

0120140206INPUT FILE
1032682842 MR SIMPSON
20231458 742 Evergreen Terrace
3034560817 GREEN
1032682842 MR GRIFFIN
20231458 Spooner Street
3034560817 RED
3001

What I would like to do is to mask position 12-16 of all lines beginnning with "10". Like this:

0120140206INPUT FILE
1032682842 XXXXXMPSON
20231458 742 Evergreen Terrace
3034560817 GREEN
1032682842 XXXXXIFFIN
20231458 Spooner Street
3034560817 RED
3001

回答1:

Using sed

sed -r '/^10/ s/^(.{11}).{5}/\1XXXXX/' file

0120140206INPUT FILE
1032682842 XXXXXMPSON
20231458 742 Evergreen Terrace
3034560817 GREEN
1032682842 XXXXXIFFIN
20231458 Spooner Street
3034560817 RED
3001

Explanation

-r useful option in sed, --regexp-extended
/^10/ Search the line beginning with 10.
s/^(.{11}).{5}/\1XXXXX/ mask position 12-16 to XXXXX

With same idea, if your awk is gawk, and support gensub() function:

awk '{$0=gensub(/^(10.{9}).{5}/,"\\1XXXXX",$0)}1' file

update: @tripleee provide a shorter one:

sed -r 's/^(10.{9}).{5}/\1XXXXX/' file

回答2:

This can be a way:

$ awk 'BEGIN{FS=OFS=""} $1$2=="10" {for(i=12;i<=16;i++) $i="X"}1' file
0120140206INPUT FILE
1032682842 XXXXXMPSON
20231458 742 Evergreen Terrace
3034560817 GREEN
1032682842 XXXXXIFFIN
20231458 Spooner Street
3034560817 RED
3001

Explanation

BEGIN{FS=OFS=""} set field separator as "", so that first char will be first field, 2nd char will be 2nd field...
$1$2=="10" {for(i=12;i<=16;i++) $i="X"} if the first char is 1 and the second 0, then change from the 12th to the 16th characters to X.
1 true condition, which is evaluated as the default awk behaviour: {print $0}.

回答3:

This awk can work:

awk '/^10/{q=substr($0, 12, 4); gsub(/./, "*", q); $0=substr($0, 1, 11) q substr($0, 17)}1' file

回答4:

This should do:

awk '/^10/{q=substr($0,1,11);r=substr($0,17); $0=q "XXXXX" r }1' file
0120140206INPUT FILE
1032682842 XXXXXMPSON
20231458 742 Evergreen Terrace
3034560817 GREEN
1032682842 XXXXXIFFIN
20231458 Spooner Street
3034560817 RED
3001

回答5:

This might work for you (GNU sed):

sed -r '/^10/{s/^(.{0,11})(.{0,5})/\1\n\2\n/;h;s/[^\n]/X/g;G;s/.*\n(.*)\n.*\n(.*)\n.*\n/\2\1/}' file

For lines beginning with 10: place two markers either side of the intended mask, copy, replace all characters other than the markers with the mask character, append the copy and manipulate the text between the markers to position the mask.

N.B. This caters for short lines and does not introduce artefacts.

回答6:

You can use gawk fixed-width data reading capability:

gawk -v FIELDWIDTHS="11 5 9999" -v OFS="" '/^10/ { $2 = "XXXXX" } ; { print }' file

See https://www.gnu.org/software/gawk/manual/gawk.html#Constant-Size.

回答7:

You can use BASH:

while read f1 f2; do
    if [[ $f1 =~ ^10 ]]; then
            f2="XXXXX${f2:5}"
    fi
    echo $f1 $f2
done < hello.txt

This will work if you only need to replace the first 5 characters of the second field with XXXXX.

If you need to replace the 12th through the 16th characters with XXXXX regardless of field, you could do the longer:

while read l; do
    if [[ $l =~ ^10 ]]; then
            b=${l:11}
            e=${l:16}
            t=${b/$e/}
            l=${l/$t/XXXXX}
    fi
    echo $l
done < hello.txt

回答8:

the perl alternative

perl -p -i -e 's/^(10\d* )[A-Z ]{6}(.*)/$1XXXXXX$2/g' filename.txt

来源：https://stackoverflow.com/questions/21624098/bash-one-liner-to-mask-data-in-file

标签

bash

sed

awk