Has anyone tried to use Shark/Spark on DataStax Enterprise?

拥有回忆 提交于 2019-12-11 14:33:28

问题


I've been trying to achieve this without success. I tried to use the included hive disitribution on dse with shark, however, shark provides with a patched up and older version of Hive (0.9 I believe), which makes shark execution impossible due to incompatibilities. I also tried to use the patched up hive version from shark instead of dse's, recycling the dse hive configuration (in order to make available CFS to shark's hive distribution) only to discover a long list of dependencies from the full dse classpath (hive, cassandra, hadoop, etc.).

It is possible to achieve this with C* by following the instructions on this blog.

Am I being stubborn by trying to use CFS? Is there a way with or without CFS on dse?

Thanks!

Here are some shark-env.sh highlights:

export HIVE_HOME="/home/cassserv/hive-0.9.0-bin/" #choosing this when using hive distro.
#export HIVE_HOME="/usr/share/dse/hive/" #choosing this when using dse distro.
export HIVE_CONF_DIR="/home/cassserv/hive-0.9.0-bin/conf" #edited dse hive-site.xml conf file
#export HIVE_CONF_DIR="/etc/dse/hive" #original dse hive-site.xml conf file

Edited hive-site.xml highlights:

<property>
    <name>hive.hwi.war.file</name>
    <!--<value>lib/hive-hwi.war</value>-->
    <value>lib/hive-hwi-0.9.0-shark-0.8.1.war</value><!--edited to use sharks distro-->
    <description>This sets the path to the HWI war file, relative to ${HIVE_HOME}</description>
  </property>

<property>
    <name>hadoop.bin.path</name>
    <!--<value>${dse.bin}/dse hadoop</value>-->
    <value>/usr/share/dse hadoop</value><!--edited to override variable-->
  </property>

Here's shark's output while trying to use sharks patched hive distro with dse's hive configuration. That missing class is in dse.jar file:

Exception in thread "main" org.apache.hadoop.hive.ql.metadata.HiveException: MetaException(message:com.datastax.bdp.hadoop.hive.metastore.CassandraHiveMetaStore class not found)

I'm trying to figure out if I can do something like this in the edited hive-site.xml:

<property>
<name>fs.cfs.impl</name>
<value>org.apache.cassandra.hadoop.fs.CassandraFileSystem</value>
</property>
<property>
    <name>hive.metastore.rawstore.impl</name>
    <!--<value>com.datastax.bdp.hadoop.hive.metastore.CassandraHiveMetaStore</value>--> <value>org.apache.hadoop.hive.metastore.ObjectStore</value>
    <description>Use the Apache Cassandra Hive RawStore implementation</description>
  </property>

in order to remove any dependency from the dse libraries. Also, might not use dse's hadoop distro.


回答1:


DSE 4.5 has Spark and Shark 0.9 integrated. You don't need to setup anything, it works out-of-the-box the same way pig/hive worked before.



来源:https://stackoverflow.com/questions/22552423/has-anyone-tried-to-use-shark-spark-on-datastax-enterprise

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!