getting number of visible nodes in PySpark

前端 未结 5 679
长发绾君心
长发绾君心 2020-12-24 07:49

I\'m running some operations in PySpark, and recently increased the number of nodes in my configuration (which is on Amazon EMR). However, even though I tripled the number

5条回答
  •  暖寄归人
    2020-12-24 08:16

    I found sometimes my sessions were killed by the remote giving a strange Java error

    Py4JJavaError: An error occurred while calling o349.defaultMinPartitions.
    : java.lang.IllegalStateException: Cannot call methods on a stopped SparkContext.
    

    I avoided this by the following

    def check_alive(spark_conn):
        """Check if connection is alive. ``True`` if alive, ``False`` if not"""
        try:
            get_java_obj = spark_conn._jsc.sc().getExecutorMemoryStatus()
            return True
        except Exception:
            return False
    
    def get_number_of_executors(spark_conn):
        if not check_alive(spark_conn):
            raise Exception('Unexpected Error: Spark Session has been killed')
        try:
            return spark_conn._jsc.sc().getExecutorMemoryStatus().size()
        except:
            raise Exception('Unknown error')
    

提交回复
热议问题