unix - cut command (adding own delimiter)

此生再无相见时 提交于 2019-12-23 17:56:50

问题


Given a file with data like this (ie stores.dat file)

id               storeNo     type
2ttfgdhdfgh      1gfdkl-28   kgdl
9dhfdhfdfh       2t-33gdm    dgjkfndkgf

Desired output:

id               |storeNo     |type
2ttfgdhdfgh      |1gfdkl-28   |kgdl
9dhfdhfdfh       |2t-33gdm    |dgjkfndkgf

Would like to add a "|" delimiter between each of these 3 cut ranges:

cut -c1-18,19-30,31-40 stores.dat

What is the syntax to insert a delimiter between each cut?

BONUS pts (if you can provide the option to trim the values like so):

id|storeNo|type
2ttfgdhdfgh|1gfdkl-28|kgdl
9dhfdhfdfh|2t-33gdm|dgjkfndkgf\

UPDATE (thanks to Mat's answer) I ended up with success on this solution - (it is a bit messy but SunOS with my bash version doesn't seem to support more elegant arithmetic)

#!/bin/bash
unpack=""
filename="$1"
while [ $# -gt 0 ] ; do
    arg="$1"
    if [ "$arg" != "$filename" ]
    then
        firstcharpos=`echo $arg | awk -F"-" '{print $1}'`
        secondcharpos=`echo $arg | awk -F"-" '{print $2}'`
        compute=`(expr $firstcharpos - $secondcharpos)`
        compute=`(expr $compute \* -1 + 1)`
        unpack=$unpack"A"$compute
    fi
    shift
done
perl -ne 'print join("|",unpack("'$unpack'", $_)), "\n";' $filename 

Usage: sh test.sh input_file 1-17 18-29 30-39


回答1:


If you're not afraid of using perl, here's a one-liner:

$ perl -ne 'print join("|",unpack("A17A12A10", $_)), "\n";' input 

The unpack call will extract one 17 char string, then a 12 char one, then a 10 char one from the input line, and return them in an array (stripping spaces). join adds the |s.

If you want the input columns to be in x-y format, without writing a "real" script, you could hack it like this (but it's ugly):

#!/bin/bash
unpack=""

while [ $# -gt 1 ] ; do
    arg=$(($1))
    shift
    unpack=$unpack"A"$((-1*$arg+1))
done

perl -ne 'print join("|",unpack("'$unpack'", $_)), "\n";' $1 

Usage: t.sh 1-17 18-29 30-39 input_file.




回答2:


Since you used cut in your example. Assuming each field is separated with a tab:

$ cut  --output-delimiter='|' -f1-3 input
id|store|No
2ttfgdhdfgh|1gfdkl-28|kgdl
9dhfdhfdfh|2t-33gdm|dgjkfndkgf

if that is not the case, add the input-separator switch -d




回答3:


I'd use awk:

awk '{print $1 "|" $2 "|" $3}'

Like some of the other suggestions, it assumes columns are whitespace separated, and doesn't care about the column numbers. If you have spaces in one of the fields, it won't work.




回答4:


Better awk solution based on character position, not whitespace

$ awk -v FIELDWIDTHS='17 12 10' -v OFS='|' '{ $1=$1 ""; print }' stores.dat | tr -d ' '

id|storeNo|type
2ttfgdhdfgh|1gfdkl-28|kgdl
9dhfdhfdfh|2t-33gdm|dgjkfndkgf



回答5:


use 'sed' to search and replace parts of a file based on regular expressions

Replace whitespace with '|' from infile1

sed -e 's/[ \t\r]/|/g' infile1 > outfile3



回答6:


You can't do that with cut as far as I am aware, but you can do it easily with sed as long as the values in each column never have internal spaces:

sed -e 's/  */|/g'

EDIT: If the file format is a true fixed-column format, and you don't want to use perl as shown by Mat, this can be done with sed but it's not pretty, because sed doesn't support numeric repetition quantifiers (.{17}), so you have to type out the right number of dots:

sed -e 's/^\(.................\)\(............\)\(..........\)$/\1|\2|\3/; s/  *|/|/g'



回答7:


How about using just tr command.

tr -s " " "|" < stores.dat

From the man page:

-s      Squeeze multiple occurrences of the characters listed in the last
        operand (either string1 or string2) in the input into a single
        instance of the character.  This occurs after all deletion and
        translation is completed.

Test:

[jaypal:~/Temp] cat stores.dat 
id               storeNo     type
2ttfgdhdfgh      1gfdkl-28   kgdl
9dhfdhfdfh       2t-33gdm    dgjkfndkgf

[jaypal:~/Temp] tr -s " " "|" < stores.dat 
id|storeNo|type
2ttfgdhdfgh|1gfdkl-28|kgdl
9dhfdhfdfh|2t-33gdm|dgjkfndkgf

You can easily redirect this to a new file like this -

[jaypal:~/Temp] tr -s " " "|" < stores.dat > new.stores.dat

Note: As Mat pointed out in the comments, this solution assumes each column is separated by one or more white-space and not separated by a fixed length.



来源:https://stackoverflow.com/questions/8630053/unix-cut-command-adding-own-delimiter

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!