I have a Spark job which reads a source table, does a number of map / flatten / reduce operations and then stores the results into a separate table we use for reporting. Cur
You can use a cron tab, but really as you start having spark jobs that depend on other spark jobs i would recommend pinball for coordination. https://github.com/pinterest/pinball
To get a simple crontab working I would create wrapper script such as
#!/bin/bash
cd /locm/spark_jobs
export SPARK_HOME=/usr/hdp/2.2.0.0-2041/spark
export HADOOP_CONF_DIR=/etc/hadoop/conf
export HADOOP_USER_NAME=hdfs
export HADOOP_GROUP=hdfs
#export SPARK_CLASSPATH=$SPARK_CLASSPATH:/locm/spark_jobs/configs/*
CLASS=$1
MASTER=$2
ARGS=$3
CLASS_ARGS=$4
echo "Running $CLASS With Master: $MASTER With Args: $ARGS And Class Args: $CLASS_ARGS"
$SPARK_HOME/bin/spark-submit --class $CLASS --master $MASTER --num-executors 4 --executor-cores 4 $ARGS spark-jobs-assembly*.jar $CLASS_ARGS >> /locm/spark_jobs/logs/$CLASS.log 2>&1
Then create a crontab by