问题
I have a 4-column CSV file, e.g.:
0001 @ fish @ animal @ eats worms
I use sed
to do a find and replace on the file, but I need to limit this find and replace to only the text found inside column 3.
How can I have a find and replace only occur on this one column?
回答1:
Are you sure you want to be using sed
? What about csvfix? Is your CSV nice and simple with no quotes or embedded commas or other nasties that make regexes...a less than satisfactory way of dealing with a general CSV file? I'm assuming that the @
is the 'comma' in your format.
Consider using awk
instead of sed
:
awk -F@ '$3 ~ /pattern/ { OFS= "@"; $3 = "replace"; }'
Arguably, you should have a BEGIN block that sets OFS once. For one line of input, it didn't make any odds (and you'd probably be hard-pressed to measure a difference on a million lines of input, too):
$ echo "pattern @ pattern @ pattern @ pattern" |
> awk -F@ '$3 ~ /pattern/ { OFS= "@"; $3 = "replace"; }'
pattern @ pattern @replace@ pattern
$
If sed
still seems appealing, then:
sed '/^\([^@]*@[^@]*\)@pattern@\(.*\)/ s//\1@replace@\2/'
For example (and note the slightly different input and output – you can fix it to handle the same as the awk
quite easily if need be):
$ echo "pattern@pattern@pattern@pattern" |
> sed '/^\([^@]*@[^@]*\)@pattern@\(.*\)/ s//\1@replace@\2/'
pattern@pattern@replace@pattern
$
The first regex looks for the start of a line, a field of non-at-signs, an at-sign, another field of non-at-signs and remembers the lot; it looks for an at-sign, the pattern (which must be in the third field since the first two fields have been matched already), another at-sign, and then the residue of the line. When the line matches, then it replaces the line with the first two fields (unchanged, as required), then adds the replacement third field, and the residue of the line (unchanged, as required).
If you need to edit rather than simply replace the third field, then you think about using awk
or Perl or Python. If you are still constrained to sed
, then you explore using the hold space to hold part of the line while you manipulate the other part in the pattern space, and end up re-integrating your desired output line from the hold space and pattern space before printing the line. That's nearly as messy as it sounds; actually, possibly even messier than it sounds. I'd go with Perl (because I learned it long ago and it does this sort of thing quite easily), but you can use whichever non-sed
tool you like.
Perl editing the third field. Note that the default output is $_
which had to be reassembled from the auto-split fields in the array @F
.
$ echo "pattern@pattern@pattern@pattern" | sh -x xxx.pl
> perl -pa -F@ -e '$F[2] =~ s/\s*pat(\w\w)rn\s*/ prefix-$1-suffix /; $_ = join "@", @F; ' "$@"
pattern@pattern@ prefix-te-suffix @pattern
$
An explanation. The -p
means 'loop, reading lines into $_
and printing $_
at the end of each iteration'. The -a
means 'auto-split $_
into the array @F
'. The -F@
means the field separator is @
. The -e
is followed by the Perl program. Arrays are indexed from 0 in Perl, so the third field is split into $F[2]
(the sigil — the @
or $
— changes depending on whether you're working with a value from the array or the array as a whole. The =~
is a match operator; it applies the regex on the RHS to the value on the LHS. The substitute pattern recognizes zero or more spaces \s*
followed by pat
then two 'word' characters which are remembered into $1
, then rn
and zero or more spaces again; maybe there should be a ^
and $
in there to bind to the start and end of the field. The replacement is a space, 'prefix-', the remembered pair of letters, and '-suffix' and a space. The $_ = join "@", @F;
reassembles the input line $_
from the possibly modified separate fields, and then the -p
prints that out. Not quite as tidy as I'd like (so there's probably a better way to do it), but it works. And you can do arbitrary transforms on arbitrary fields in Perl without much difficulty. Perl also has a module Text::CSV
(and a high-speed C version, Text::CSV_XS
) which can handle really complex CSV files.
回答2:
Essentially break the line into three pieces, with the pattern you're looking for in the middle. Then keep the outer pieces and replace the middle.
/\([^@]*@[^@]*@\[^@]*\)pattern\([^@]*@.*\)/s//\1replacement\2/
\([^@]*@[^@]*@\[^@]*\)
- gather everything before the pattern, including the 3rd @ and any text before the math - this becomes \1
pattern
- the thing you're looking for
\([^@]*@.*\)
- gather everything after the pattern - this becomes \2
Then change that line into \1
then the replacement
, then everything after pattern
, which is \2
回答3:
This might work for you:
echo 0001 @ fish @ animal @ eats worms|
sed 's/@/&\n/2;s/@/\n&/3;h;s/\n@.*//;s/.*\n//;y/a/b/;G;s/\([^\n]*\)\n\([^\n]*\).*\n/\2\1/'
0001 @ fish @ bnimbl @ eats worms
Explanation:
- Define the field to be worked on (in this case the 3rd) and insert a newline (
\n
) before it and directly after it.s/@/&\n/2;s/@/\n&/3
- Save the line in the hold space.
h
- Delete the fields either side
s/\n@.*//;s/.*\n//
- Now process the field i.e. change all
a's
tob's
.y/a/b/
- Now append the original line.
G
- Substitute the new field for the old field (also removing any newlines).
s/\([^\n]*\)\n\([^\n]*\).*\n/\2\1/
N.B. That in step 4 the pattern space only contains the defined field, so any number of commands may be carried out here and the result will not affect the rest of the line.
来源:https://stackoverflow.com/questions/10050753/how-to-restrict-a-find-and-replace-to-only-one-column-within-a-csv