What is Spark Job ?

后端 未结 2 1969
死守一世寂寞
死守一世寂寞 2021-01-30 16:50

I have already done with spark installation and executed few testcases setting master and worker nodes. That said, I have a very fat confusion of what exactly a job is meant in

2条回答
  •  暗喜
    暗喜 (楼主)
    2021-01-30 17:38

    Hey here's something I did before, hope it works for you:

    #!/bin/bash
    # Hadoop and Server Variables
    HADOOP="hadoop fs"
    HDFS_HOME="hdfs://ha-edge-group/user/max"
    LOCAL_HOME="/home/max"
    
    # Cluster Variables
    DRIVER_MEM="10G"
    EXECUTOR_MEM="10G"
    CORES="5"
    EXECUTORS="15"
    
    # Script Arguments
    SCRIPT="availability_report.py" # Arg[0]
    APPNAME="Availability Report" # arg[1]
    
    DAY=`date -d yesterday +%Y%m%d`
    
    for HOUR in 00 01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 18 19 20 21 22 23
    do
            #local directory to getmerge to
            LOCAL_OUTFILE="$LOCAL_HOME/availability_report/data/$DAY/$HOUR.txt"
    
            # Script arguments
            HDFS_SOURCE="webhdfs://1.2.3.4:0000/data/lbs_ndc/raw_$DAY'_'$HOUR" # arg[2]
            HDFS_CELLS="webhdfs://1.2.3.4:0000/data/cells/CELLID_$DAY.txt" # arg[3]
            HDFS_OUT_DIR="$HDFS_HOME/availability/$DAY/$HOUR" # arg[4]
    
            spark-submit \
            --master yarn-cluster \
            --driver-memory $DRIVER_MEM \
            --executor-memory $EXECUTOR_MEM \
            --executor-cores $CORES \
            --num-executors $EXECUTORS \
            --conf spark.scheduler.mode=FAIR \
            $SCRIPT $APPNAME $HDFS_SOURCE $HDFS_CELLS $HDFS_OUT_DIR
    
            $HADOOP -getmerge $HDFS_OUT_DIR $LOCAL_OUTFILE
    done
    

提交回复
热议问题