How to run PySpark jobs from a local Jupyter notebook to a Spark master in a Docker container?

你说的曾经没有我的故事 提交于 2020-04-10 05:29:07

问题


I have a Docker container that's running Apache Spark with a master and a slave worker. I'm attempting to submit a job from a Jupyter notebook on the host machine. See below:

# Init
!pip install findspark
import findspark
findspark.init()


# Context setup
from pyspark import SparkConf, SparkContext
# Docker container is exposing port 7077
conf = SparkConf().setAppName('test').setMaster('spark://localhost:7077')
sc = SparkContext(conf=conf)
sc

# Execute step
import random
num_samples = 1000
def inside(p):     
  x, y = random.random(), random.random()
  return x*x + y*y < 1
count = sc.parallelize(range(0, num_samples)).filter(inside).count()
pi = 4 * count / num_samples
print(pi)

The execute step shows the following error:

Py4JJavaError: An error occurred while calling z:org.apache.spark.api.python.PythonRDD.collectAndServe.
: org.apache.spark.SparkException: 
    Job aborted due to stage failure: Task 1 in stage 0.0 failed 4 times, most recent failure: Lost task 1.3 in stage 0.0 (TID 6, 172.17.0.2, executor 0): 

    java.io.IOException: Cannot run program "/Users/omar/anaconda3/bin/python": error=2, No such file or directory

It looks to me that the command is trying to run Spark job locally when it should be send it to the Spark master specified in the previous steps. Is this not possible through a Jupyter notebook?

My container is based off of https://hub.docker.com/r/p7hb/docker-spark/ but I installed Python 3.6 under /usr/bin/python3.6.


回答1:


I had to do the following before I created the SparkContext:

import os
# Path on master/worker where Python is installed
os.environ['PYSPARK_PYTHON'] = '/usr/bin/python3.6'

Some research showed that I need to add this to /usr/local/spark/conf/spark-env.sh via:

export PYSPARK_PYTHON='/usr/bin/python3.6'

But that isn't working.



来源:https://stackoverflow.com/questions/44788720/how-to-run-pyspark-jobs-from-a-local-jupyter-notebook-to-a-spark-master-in-a-doc

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!