I\'m running some operations in PySpark, and recently increased the number of nodes in my configuration (which is on Amazon EMR). However, even though I tripled the number
I found sometimes my sessions were killed by the remote giving a strange Java error
Py4JJavaError: An error occurred while calling o349.defaultMinPartitions.
: java.lang.IllegalStateException: Cannot call methods on a stopped SparkContext.
I avoided this by the following
def check_alive(spark_conn):
"""Check if connection is alive. ``True`` if alive, ``False`` if not"""
try:
get_java_obj = spark_conn._jsc.sc().getExecutorMemoryStatus()
return True
except Exception:
return False
def get_number_of_executors(spark_conn):
if not check_alive(spark_conn):
raise Exception('Unexpected Error: Spark Session has been killed')
try:
return spark_conn._jsc.sc().getExecutorMemoryStatus().size()
except:
raise Exception('Unknown error')