Apache Spark : JDBC connection not working

后端未结

关注

 6  1619

I have asked this question previously also but did not got any answer (Not able to connect to postgres using jdbc in pyspark shell).

I have successfully installed Sp

相关标签:

6条回答

你的背包

2020-12-06 12:04
As jake256 suggested

"driver", "org.postgresql.Driver"

key-value pair was missing. In my case, I launched pyspark as :
```
pyspark --jars /path/to/postgresql-9.4.1210.jar
```
with following instructions :
```
  from pyspark.sql import DataFrameReader

  url = 'postgresql://192.168.2.4:5432/postgres'
  properties = {'user': 'myUser', 'password': 'myPasswd', 'driver': 'org.postgresql.Driver'}
  df = DataFrameReader(sqlContext).jdbc(
      url='jdbc:%s' % url, table='weather', properties=properties
  )
  df.show()

  +-------------+-------+-------+-----------+----------+
  |         city|temp_lo|temp_hi|       prcp|      date|
  +-------------+-------+-------+-----------+----------+
  |San Francisco|     46|     50|       0.25|1994-11-27|
  |San Francisco|     43|     57|        0.0|1994-11-29|
  |      Hayward|     54|     37|0.239999995|1994-11-29|
  +-------------+-------+-------+-----------+----------+
```
Tested on :
- Ubuntu 16.04
- PostgreSQL server version 9.5.
- Postgresql driver used is postgresql-9.4.1210.jar
- and Spark version is spark-2.0.0-bin-hadoop2.6
- but I am also confident that it should also work on spark-2.0.0-bin-hadoop2.7.
- Java JDK 1.8 64bits
other JDBC Drivers can be found on : https://www.petefreitag.com/articles/jdbc_urls/

tutorial I followed is on : https://developer.ibm.com/clouddataservices/2015/08/19/speed-your-sql-queries-with-spark-sql/

similar solution was suggested also on : pyspark mysql jdbc load An error occurred while calling o23.load No suitable driver
0 讨论(0)
发布评论:

提交评论
- 加载中...
礼貌的吻别

2020-12-06 12:07

This error seems to get thrown when you use the wrong version of JDBC driver. Check https://jdbc.postgresql.org/download.html to make sure that you have the right one.

Note in particular:

JDK 1.1 - JDBC 1. Note that with the 8.0 release JDBC 1 support has been removed, so look to update your JDK when you update your server.

JDK 1.2, 1.3 - JDBC 2. JDK 1.3 + J2EE - JDBC 2 EE. This contains additional support for javax.sql classes.

JDK 1.4, 1.5 - JDBC 3. This contains support for SSL and javax.sql, but does not require J2EE as it has been added to the J2SE release. JDK 1.6 - JDBC4. Support for JDBC4 methods is not complete, but the majority of methods are implemented.

JDK 1.7, 1.8 - JDBC41. Support for JDBC4 methods is not complete, but the majority of methods are implemented.

0 讨论(0)
发布评论:

提交评论
- 加载中...
广开言路

2020-12-06 12:07

see this post please, just place your script after all the options. see this

0 讨论(0)
发布评论:

提交评论
- 加载中...
余生分开走

2020-12-06 12:08
I had this exact problem with mysql/mariadb, and got BIG clue from this question

So your pyspark command should be:
```
pyspark --conf spark.executor.extraClassPath=<jdbc.jar> --driver-class-path <jdbc.jar> --jars <jdbc.jar> --master <master-URL>
```
Also watch for errors when pyspark start like "Warning: Local jar ... does not exist, skipping." and "ERROR SparkContext: Jar not found at ...", these probably mean you spelled the path wrong.
0 讨论(0)
发布评论:

提交评论
- 加载中...
灰色年华

2020-12-06 12:08

That’s pretty straightforward. To connect to external database to retrieve data into Spark dataframes additional jar file is required. E.g. with MySQL the JDBC driver is required. Download the driver package and extract mysql-connector-java-x.yy.zz-bin.jar in a path that’s accessible from every node in the cluster. Preferably this is a path on shared file system. E.g. with Pouta Virtual Cluster such path would be under /shared_data, here I use /shared_data/thirdparty_jars/.

With direct Spark job submissions from terminal one can specify –driver-class-path argument pointing to extra jars that should be provided to workers with the job. However this does not work with this approach, so we must configure these paths for front end and worker nodes in the spark-defaults.conf file, usually in /opt/spark/conf directory.

spark.driver.extraClassPath /"your-path"/mysql-connector-java-5.1.35-bin.jar spark.executor.extraClassPath /"your-path"/mysql-connector-java-5.1.35-bin.jar

0 讨论(0)
发布评论:

提交评论
- 加载中...

梦谈多话

2020-12-06 12:18

A slightly more elegant solution:

val props = new Properties
props.put("driver", "org.postgresql.Driver")
sqlContext.read.jdbc("jdbc:postgresql://[host]/[dbname]", props)

0 讨论(0)