Airflow SparkSubmitOperator - How to spark-submit in another server
问题 I am new to Airflow and Spark and I am struggling with the SparkSubmitOperator . Our airflow scheduler and our hadoop cluster are not set up on the same machine ( first question: is it a good practice? ). We have many automatic procedures that need to call pyspark scripts. Those pyspark scripts are stored in the hadoop cluster (10.70.1.35). The airflow dags are stored in the airflow machine (10.70.1.22). Currently, when we want to spark-submit a pyspark script with airflow, we use a simple