Building on the script from this answer, I have the following scenario: A folder containing 2500 large text files (~ 55Mb each), all tab delimited. Web logs, basically.
Why do things simple when one can make them complicated?
mount the drives via smbfs or whatnot on linux host and run
#! /bin/sh
SRC="" # FIXME
DST="" # FIXME
convert_line() {
new_line=`echo $i | cut -f 1 -d "\t"`
f2=`echo $i | cut -f 2 -d "\t"`
frest=`echo $i | cut -f 1,2 --complement -d "\t"`
if [ ! "x${f2}" = "-" ] ; then
f2=`echo "${f2}" | md5sum | head -c-1`
# might wanna throw in some memoization
fi
echo "${new_line}\t$f2\t${frest}"
}
convert_file() {
for i in `cat $1`; do
convert_line "${i}" >> $DST/hashed-$1
done
}
for i in $SRC/*; do
convert_file $i
done
not tested. might need polishing some rough edges.