Why does this python multiprocessing script slow down after a while?

后端 未结 2 1962
夕颜
夕颜 2021-01-02 05:07

Building on the script from this answer, I have the following scenario: A folder containing 2500 large text files (~ 55Mb each), all tab delimited. Web logs, basically.

2条回答
  •  刺人心
    刺人心 (楼主)
    2021-01-02 05:29

    Why do things simple when one can make them complicated?

    mount the drives via smbfs or whatnot on linux host and run

    #! /bin/sh
    
    SRC="" # FIXME
    DST="" # FIXME
    
    convert_line() {
        new_line=`echo $i | cut -f 1 -d "\t"`
        f2=`echo $i | cut -f 2 -d "\t"`
        frest=`echo $i | cut -f 1,2 --complement -d "\t"`
    
        if [ ! "x${f2}" = "-" ] ; then
            f2=`echo "${f2}" | md5sum | head -c-1`
            # might wanna throw in some memoization
        fi
    
        echo "${new_line}\t$f2\t${frest}"
    }
    
    convert_file() {
        for i in `cat $1`; do
            convert_line "${i}" >> $DST/hashed-$1
        done
    }
    
    for i in $SRC/*; do
        convert_file $i
    done
    

    not tested. might need polishing some rough edges.

提交回复
热议问题