I am trying to connect to a database with pyspark and I am using the following code:
sqlctx = SQLContext(sc)
df = sqlctx.load(
url = "jdbc:postgresql
Just initialize pyspark with --jars
E.g.: pyspark --jars /path/Downloads/postgresql-42.2.16.jar
then create a dataframe as suggested above in other answers
E.g.:
df2 = spark.read.format("jdbc").option("url", "jdbc:postgresql://localhost:5432/db").option("dbtable", "yourTableHere").option("user", "postgres").option("password", "postgres").option("driver", "org.postgresql.Driver").load()