Using pyspark to connect to PostgreSQL

前端 未结 10 1409
逝去的感伤
逝去的感伤 2020-12-01 04:50

I am trying to connect to a database with pyspark and I am using the following code:

sqlctx = SQLContext(sc)
df = sqlctx.load(
    url = "jdbc:postgresql         


        
10条回答
  •  温柔的废话
    2020-12-01 05:12

    It is necesary copy postgresql-42.1.4.jar in all nodes... for my case, I did copy in the path /opt/spark-2.2.0-bin-hadoop2.7/jars

    Also, i set classpath in ~/.bashrc (export SPARK_CLASSPATH="/opt/spark-2.2.0-bin-hadoop2.7/jars" )

    and work fine in pyspark console and jupyter

提交回复
热议问题