Spark to MongoDB via Mesos

北城以北 提交于 2019-12-12 04:59:13

问题


I am trying to connect Apache Spark to MongoDB using Mesos. Here is my architecture: -

MongoDB: MongoDB Cluster of 2 shards, 1 config server and 1 query server. Mesos: 1 Mesos Master, 4 Mesos slaves

Now I have installed Spark on just 1 node. There is not much information available on this out there. I just wanted to pose a few questions: -

As per what I understand, I can connect Spark to MongoDB via mesos. In other words, I end up using MongoDB as a storage layer. Do I really Need Hadoop? Is it mandatory to pull all the data into Hadoop just for Spark to read it?

Here is the reason I am asking this. The Spark Install expects the HADOOP_HOME variable to be set. This seems like very tight coupling !! Most of the posts on the net speak about MongoDB-Hadoop connector. It doesn't make sense if you're forcing me to move everything to hadoop.

Does anyone have an answer?

Regards Mario


回答1:


Spark itself takes a dependency on Hadoop and data in HDFS can be used as a datasource.

However, if you use the Mongo Spark Connector you can use MongoDB as a datasource for Spark without going via Hadoop at all.




回答2:


Spark-mongo connector is good idea, moreover if your are executing Spark in a hadoop cluster you need set HADOOP_HOME.

Check your requeriments and test it (tutorial)

Basic working knowledge of MongoDB and Apache Spark. Refer to the MongoDB documentation and Spark documentation.
Running MongoDB instance (version 2.6 or later).
Spark 1.6.x.
Scala 2.10.x if using the mongo-spark-connector_2.10 package
Scala 2.11.x if using the mongo-spark-connector_2.11 package

The new MongoDB Connector for Apache Spark provides higher performance, greater ease of use and, access to more advanced Spark functionality than the MongoDB Connector for Hadoop. The following table compares the capabilities of both connectors.

Then you need to configure Spark with mesos:

Connecting Spark to Mesos

To use Mesos from Spark, you need a Spark binary package available in a place accessible by Mesos, and a Spark driver program configured to connect to Mesos.

Alternatively, you can also install Spark in the same location in all the Mesos slaves, and configure spark.mesos.executor.home (defaults to SPARK_HOME) to point to that location.


来源:https://stackoverflow.com/questions/38995739/spark-to-mongodb-via-mesos

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!