i'm using PyCharm 2019.1, and Python 3.7 (in Project Interpreter) On PyCharm, i've added Pyspark 2.4.2
when i run the following code (to create a Spark DataFrame), i get error
java.util.NoSuchElementException: key not found: _PYSPARK_DRIVER_CALLBACK_HOST .... Exception: Java gateway process exited before sending its port number
from the other SO issues, it seems that it is related to version mismatch, question is how to resolve this
my $SPARK_HOME points to Apache Spark 2.2.0, when i try to install 2.2.0 on Pycharm, it gives error
Collecting pyspark==2.2.0 Could not find a version that satisfies the requirement pyspark==2.2.0 (from versions: 2.1.2, 2.1.3, 2.2.0.post0, 2.2.1, 2.2.2, 2.2.3, 2.3.0, 2.3.1, 2.3.2, 2.3.3, 2.4.0, 2.4.1, 2.4.2, 2.4.3) No matching distribution found for pyspark==2.2.0
Any ideas on how to fix this ?
CODE ->
from pyspark.sql import SparkSession d = {'a':1, 'b':2, 'c':3} spark = SparkSession.builder.master("local").appName("CreatingDF").getOrCreate() pandaDF = spark.createDataFrame(d) print(pandaDF)
ERROR ->
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties Setting default log level to "WARN". To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel). 19/05/06 23:21:45 ERROR SparkUncaughtExceptionHandler: Uncaught exception in thread Thread[main,5,main] java.util.NoSuchElementException: key not found: _PYSPARK_DRIVER_CALLBACK_HOST at scala.collection.MapLike$class.default(MapLike.scala:228) at scala.collection.AbstractMap.default(Map.scala:59) at scala.collection.MapLike$class.apply(MapLike.scala:141) at scala.collection.AbstractMap.apply(Map.scala:59) at org.apache.spark.api.python.PythonGatewayServer$$anonfun$main$1.apply$mcV$sp(PythonGatewayServer.scala:50) at org.apache.spark.util.Utils$.tryOrExit(Utils.scala:1262) at org.apache.spark.api.python.PythonGatewayServer$.main(PythonGatewayServer.scala:37) at org.apache.spark.api.python.PythonGatewayServer.main(PythonGatewayServer.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:497) at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:755) at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:180) at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:205) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:119) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) Traceback (most recent call last): File "/Users/karanalang/PycharmProjects/PythonFalcon/FalconIncremental/python_createDF2.py", line 28, in <module> spark = SparkSession.builder.master("local").appName("CreatingDF").getOrCreate() File "/Users/karanalang/anaconda3/lib/python3.7/site-packages/pyspark/sql/session.py", line 173, in getOrCreate sc = SparkContext.getOrCreate(sparkConf) File "/Users/karanalang/anaconda3/lib/python3.7/site-packages/pyspark/context.py", line 367, in getOrCreate SparkContext(conf=conf or SparkConf()) File "/Users/karanalang/anaconda3/lib/python3.7/site-packages/pyspark/context.py", line 133, in __init__ SparkContext._ensure_initialized(self, gateway=gateway, conf=conf) File "/Users/karanalang/anaconda3/lib/python3.7/site-packages/pyspark/context.py", line 316, in _ensure_initialized SparkContext._gateway = gateway or launch_gateway(conf) File "/Users/karanalang/anaconda3/lib/python3.7/site-packages/pyspark/java_gateway.py", line 46, in launch_gateway return _launch_gateway(conf) File "/Users/karanalang/anaconda3/lib/python3.7/site-packages/pyspark/java_gateway.py", line 108, in _launch_gateway raise Exception("Java gateway process exited before sending its port number") Exception: Java gateway process exited before sending its port number