Run Spark with build-in Hive and Configuring a remote PostgreSQL database for the Hive Metastore

落花浮王杯 提交于 2019-12-07 16:04:02

问题


I am new to Spark and Hive. I am running Spark v1.0.1 with build-in Hive (Spark install with SPARK_HIVE=true sbt/sbt assembly/assembly)

I also config Hive to store Metastore in PostgreSQL database as instruction:

http://www.cloudera.com/content/cloudera-content/cloudera-docs/CDH4/4.2.0/CDH4-Installation-Guide/cdh4ig_topic_18_4.html

I could config Hive (not build-in with Spark) to use PostgreSQL but I don't know how to get it work with Hive in Spark

In the instruction, I see that I need to put or link postgresql-jdbc.jar to hive/lib so that Hive could include the postgresql-jdbc when it run

$ sudo yum install postgresql-jdbc
$ ln -s /usr/share/java/postgresql-jdbc.jar /usr/lib/hive/lib/postgresql-jdbc.jar

With Build-in Hive in Spark, where should I put the postgresql-jdbc.jar to get it work?


回答1:


I find the solution for my problem. I need to add CLASSPATH for SPARK so that build-in Hive could use postgresql-jdbc4.jar

I add 3 environment variables:

export CLASSPATH="$CLASSPATH:/usr/share/java/postgresql-jdbc4.jar"
export SPARK_CLASSPATH=$CLASSPATH
export SPARK_SUBMIT_CLASSPATH=$CLASSPATH

SPARK_CLASSPATH is used for spark-shell

SPARK_SUBMIT_CLASSPATH is used for spark-submit (I am not sure)

Now I could use spark-shell with build-in Hive which config to use Metastore in Postgres




回答2:


You have two options:

  1. You can continue to use your own hive installation. You need to put a copy of hive-site.xml (or make a symlink) under $SPARK_HOME/conf/hive-site.xml
  2. If you want to use the built-in hive: you need to modify the $SPARK_HOME/hive-<version>/conf/hive-site.xml .
    Inside the hive-site.xml you need to modify the javax.jdo.option.* values. Along the lines of the following:

    <property>
     <name>hive.metastore.local</name>
     <value>true</value>
       </property>
       <property>
     <name>javax.jdo.option.ConnectionURL</name>
     <value>jdbc:postgresql://localhost:5432/hivedb</value>
    </property>
    <property>
       <name>javax.jdo.option.ConnectionDriverName</name>
       <value>org.postgresql.Driver</value>
     </property>
     <property>
       <name>javax.jdo.option.ConnectionUserName</name>
       <value>******</value>
     </property>
     <property>
       <name>javax.jdo.option.ConnectionPassword</name>
       <value>******</value>
     </property>
    


来源:https://stackoverflow.com/questions/25151307/run-spark-with-build-in-hive-and-configuring-a-remote-postgresql-database-for-th

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!