I have a set of data as input and need the second last field based on deleimiter. The lines may have different numbers of delimiter. How can I get second last field ?
cuts utility:$ cat file.txt
text,blah,blaah,foo
this,is,another,text,line
$ cuts -2 file.txt
blaah
text
cuts, which stands for "cut on steroids":
- automatically figures out the input field separators
- supports multi-char (and regexp) separators
- automatically pastes (side-by-side) multiple columns from multiple files
- supports negative offsets (from end of line)
- has good defaults to save typing + allows the user to override them
and much more.
I wrote cuts after being frustrated with the too many limitations of cut on Unix. It is designed to replace various cut/paste combos, slicing and dicing columns from multiple files, with multiple separator variations, while imposing minimal typing from the user.
You can get cuts (free software, Artistic Licence) from github: https://github.com/arielf/cuts/
Calling cuts without arguments will print a detailed Usage message.
There's no need to use cut, rev, or any other tools external to bash here at all. Just read each line into an array, and pick out the piece you want:
while IFS=, read -r -a entries; do
printf '%s\n' "${entries[${#entries[@]} - 2]}"
done <file
Doing this in pure bash is far faster than starting up a pipeline, at least for reasonably small inputs. For large inputs, the better tool is awk.
Code for GNU sed:
$ echo text,blah,blaah,foo|sed -r 's/^(\S+,){2}(\S+),.*/\2/'
blaah
$ echo this,is,another,text,line|sed -r 's/^(\S+,){2}(\S+),.*/\2/'
text
Code example similar to sudo_O's awk code:
$ sed -r 's/.*,(\w+),\w+$/\1/' file blaah text
It might be better to use more specialised programs for CSV files, eg. awk or excel.
Awk is suited well for this:
awk -F, '{print $(NF-1)}' file
The variable NF is a special awk variable that contains the number of fields in the current record.
Perl solution similar to awk solution from @iiSeymour
perl -lane 'print $F[-2]' file
These command-line options are used:
n loop around every line of the input file, do not automatically print every line
l removes newlines before processing, and adds them back in afterwards
a autosplit mode – split input lines into the @F array. Defaults to splitting on whitespace
e execute the perl code
The @F autosplit array starts at index [0] while awk fields start with $1
-1 is the last element
-2 is the second to last element
Got a hint from Unix cut except last two tokens and able to figure out the answer :
cat datafile | rev | cut -d '/' -f 2 | rev