I am trying to connect to a database with pyspark and I am using the following code:
sqlctx = SQLContext(sc) df = sqlctx.load( url = "jdbc:postgresql
It is necesary copy postgresql-42.1.4.jar in all nodes... for my case, I did copy in the path /opt/spark-2.2.0-bin-hadoop2.7/jars
Also, i set classpath in ~/.bashrc (export SPARK_CLASSPATH="/opt/spark-2.2.0-bin-hadoop2.7/jars" )
and work fine in pyspark console and jupyter