Accessing Hive tables in spark

后端 未结 1 989
北荒
北荒 2021-01-20 19:24

I have Hive 0.13 installation and have created custom databases. I have spark 1.1.0 single node cluster built using mvn -hive option. I want to access tables in this databa

相关标签:
1条回答
  • 2021-01-20 19:57

    Step 1: Setup SPARK with latest version....

    $ cd $SPARK_Home; ./sbt/sbt -Phive assembly
    $ cd $SPARK_Home; ./sbt/sbt -Phivethriftserver assembly
    

    By executing this you will download some jar files and bydefault it will be added no need to add....

    Step 2:
    Copy hive-site.xml from your Hive cluster to your $SPARK_HOME/conf/dir and edit the XML file and add these properties to that file which is listed below:

    <property>
        <name>javax.jdo.option.ConnectionURL</name>
        <value>jdbc:mysql://MYSQL_HOST:3306/hive_{version}</value>
        <description>JDBC connect string for a JDBC metastore</description>
    </property>
    <property>
        <name>javax.jdo.option.ConnectionDriverName</name>
        <value>com.mysql.jdbc.Driver</value>
        <description>Driver class name for a JDBC metastore/description>
    </property>
    <property>
        <name>javax.jdo.option.ConnectionUserName</name>
        <value>XXXXXXXX</value>
        <description>Username to use against metastore database/description>
    </property> 
    <property>
        <name>javax.jdo.option.ConnectionPassword</name>
        <value>XXXXXXXX</value>
        <description>Password to use against metastore database/description>
    </property>
    

    Step 3: Download MYSQL JDBC connector and add that to SPARK CLASSPATH. Run this command bin/compute-classpath.sh
    and add the below line for the following script.

    CLASSPATH=”$CLASSPATH:$PATH_TO_mysql-connector-java-5.1.10.jar
    

    How to retrieve the data from HIVE to SPARK....

    Step 1:
    Start all deamons by the following command....

    start-all.sh
    

    Step 2:
    Start hive thrift server 2 by the following command....

    hive --service hiveserver2 & 
    

    Step 3:
    Start spark server by the following command....

    start-spark.sh 
    

    And finally check whether these are started or not by checking with the following command....

    RunJar 
    ResourceManager 
    Master 
    NameNode 
    SecondaryNameNode 
    Worker 
    Jps 
    JobHistoryServer 
    DataNode 
    NodeManager
    

    Step 4:
    Start the master by the following command....

    ./sbin/start-master.sh 
    

    To stop the master use the below command.....

    ./sbin/stop-master.sh
    

    Step 5:
    Open a new terminal....
    Start the beeline by the following path....

    hadoop@localhost:/usr/local/hadoop/hive/bin$ beeline 
    

    After it asks for input... Pass the input which is listed below....

    !connect jdbc:hive2://localhost:10000 hadoop "" org.apache.hive.jdbc.HiveDriver 
    

    After that set the SPARK by the following commands....
    Note:set these configurations on a conf file so no need to run always....

    set spark.master=spark://localhost:7077; 
    set hive.execution.engines=spark; 
    set spark.executor.memory=2g; // set the memory depends on your server
    set spark.serializer=org.apache.spark.serializer.kryoSerializer; 
    set spark.io.compression.codec=org.apache.spark.io.LZFCompressionCodec; 
    

    After it asks for input.... Pass the Query which you want to retrieve the data.... and open a browser and check in the URL by the following command localhost:8080 You can see the Running Jobs and Completed Jobs in the URL....

    0 讨论(0)
提交回复
热议问题