I have two files I\'m trying to join/merge based on columns 1
and 2
. They look something like this, with file1
(58210
lin
You can use the join
command but you need to create a single join field in each data table. Assuming that you do have values other that 2L
in column 1, then this code should work regardless of the sorted or unsorted nature of the two input files:
tmp=${TMPDIR:-/tmp}/tmp.$$
trap "rm -f $tmp.?; exit 1" 0 1 2 3 13 15
awk '{print $1 ":" $2, $0}' file1 | sort > $tmp.1
awk '{print $1 ":" $2, $0}' file2 | sort > $tmp.2
join -o 2.2,2.3,2.4,2.5,1.4 $tmp.1 $tmp.2
rm -f $tmp.?
trap 0
If you have bash
and 'process substitution', or if you know that the data is already sorted appropriately, you can simplify the processing.
I'm not entirely sure why your code wasn't working, but I'd probably be using a[$1,$2]
for the subscripts; it will give you less trouble if some of your column 1 values are pure numeric and can therefore be confused when you concatenate columns 1 and 2. That's why the 'key creation' awk
scripts used a colon between the fields.
With revised data files as shown:
2L 5753 33158
2L 8813 33158
2L 7885 33158
2L 7885 33159
2L 1279 33158
2L 5095 33158
2L 3256 33158
2L 5372 33158
2L 7088 33161
2L 5762 33161
2L 5095 0.666666666666667 1
2L 5372 0.5 0.925925925925926
2L 5762 0.434782608695652 0.580645161290323
2L 5904 0.571428571428571 0.869565217391304
2L 5974 0.434782608695652 0.694444444444444
2L 6353 0.785714285714286 0.84
2L 7088 0.590909090909091 0.733333333333333
2L 7885 0.714285714285714 0.864864864864865
2L 7902 0.642857142857143 0.810810810810811
2L 8263 0.833333333333333 0.787878787878788
(Unchanged from the question.)
2L 5095 0.666666666666667 1 33158
2L 5372 0.5 0.925925925925926 33158
2L 5762 0.434782608695652 0.580645161290323 33161
2L 7088 0.590909090909091 0.733333333333333 33161
2L 7885 0.714285714285714 0.864864864864865 33158
2L 7885 0.714285714285714 0.864864864864865 33159