Windows error while running standalone pyspark

末鹿安然 提交于 2020-08-08 04:33:10

问题


I am trying to import pyspark in Anaconda and run sample code. However, whenever I try to run the code in Anaconda, I get following error message.

ERROR:py4j.java_gateway:An error occurred while trying to connect to the Java server (127.0.0.1:53294) Traceback (most recent call last): File "C:\spark\python\lib\py4j-0.10.3-src.zip\py4j\java_gateway.py", line 1021, in send_command self.socket.sendall(command.encode("utf-8")) ConnectionResetError: [WinError 10054] An existing connection was forcibly closed by the remote host

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "C:\spark\python\lib\py4j-0.10.3-src.zip\py4j\java_gateway.py", line 883, in send_command response = connection.send_command(command) File "C:\spark\python\lib\py4j-0.10.3-src.zip\py4j\java_gateway.py", line 1025, in send_command "Error while sending", e, proto.ERROR_ON_SEND) py4j.protocol.Py4JNetworkError: Error while sending

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "C:\spark\python\lib\py4j-0.10.3-src.zip\py4j\java_gateway.py", line 827, in _get_connection connection = self.deque.pop() IndexError: pop from an empty deque

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "C:\spark\python\lib\py4j-0.10.3-src.zip\py4j\java_gateway.py", line 963, in start self.socket.connect((self.address, self.port)) ConnectionRefusedError: [WinError 10061] No connection could be made because the target machine actively refused it ERROR:py4j.java_gateway:An error occurred while trying to connect to the Java server (127.0.0.1:53294) Traceback (most recent call last): File "C:\spark\python\lib\py4j-0.10.3-src.zip\py4j\java_gateway.py", line 827, in _get_connection connection = self.deque.pop() IndexError: pop from an empty deque

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "C:\spark\python\lib\py4j-0.10.3-src.zip\py4j\java_gateway.py", line 963, in start self.socket.connect((self.address, self.port)) ConnectionRefusedError: [WinError 10061] No connection could be made because the target machine actively refused it ERROR:py4j.java_gateway:An error occurred while trying to connect to the Java server (127.0.0.1:53294) Traceback (most recent call last): File "C:\spark\python\lib\py4j-0.10.3-src.zip\py4j\java_gateway.py", line 827, in _get_connection connection = self.deque.pop() IndexError: pop from an empty deque

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "C:\spark\python\lib\py4j-0.10.3-src.zip\py4j\java_gateway.py", line 963, in start self.socket.connect((self.address, self.port)) ConnectionRefusedError: [WinError 10061] No connection could be made because the target machine actively refused it ERROR:py4j.java_gateway:An error occurred while trying to connect to the Java server (127.0.0.1:53294) Traceback (most recent call last): File "C:\spark\python\lib\py4j-0.10.3-src.zip\py4j\java_gateway.py", line 827, in _get_connection connection = self.deque.pop() IndexError: pop from an empty deque

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "C:\spark\python\lib\py4j-0.10.3-src.zip\py4j\java_gateway.py", line 963, in start self.socket.connect((self.address, self.port)) ConnectionRefusedError: [WinError 10061] No connection could be made because the target machine actively refused it ERROR:py4j.java_gateway:An error occurred while trying to connect to the Java server (127.0.0.1:53294) Traceback (most recent call last): File "C:\spark\python\lib\py4j-0.10.3-src.zip\py4j\java_gateway.py", line 827, in _get_connection connection = self.deque.pop() IndexError: pop from an empty deque

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "C:\spark\python\lib\py4j-0.10.3-src.zip\py4j\java_gateway.py", line 963, in start self.socket.connect((self.address, self.port)) ConnectionRefusedError: [WinError 10061] No connection could be made because the target machine actively refused it Reloaded modules: py4j.protocol, pyspark.sql.context, py4j.java_gateway, py4j.compat, pyspark.profiler, pyspark.sql.catalog, pyspark.context, pyspark.sql.group, pyspark.sql.conf, pyspark.sql.readwriter, pyspark.resultiterable, pyspark.sql, pyspark.sql.dataframe, pyspark.traceback_utils, pyspark.cloudpickle, pyspark.rddsampler, pyspark.accumulators, pyspark.broadcast, py4j, pyspark.rdd, pyspark.sql.functions, pyspark.java_gateway, pyspark.statcounter, pyspark.conf, pyspark.serializers, pyspark.files, pyspark.join, pyspark.sql.streaming, pyspark.shuffle, pyspark, py4j.version, pyspark.sql.session, pyspark.sql.column, py4j.finalizer, py4j.java_collections, pyspark.status, pyspark.sql.window, pyspark.sql.utils, pyspark.storagelevel, pyspark.heapq3, py4j.signals, pyspark.sql.types Traceback (most recent call last):

File "", line 1, in runfile('C:/Users/hlee/Desktop/pyspark.py', wdir='C:/Users/hlee/Desktop')

File "C:\Program Files\Anaconda3\lib\site-packages\spyder\utils\site\sitecustomize.py", line 866, in runfile execfile(filename, namespace)

File "C:\Program Files\Anaconda3\lib\site-packages\spyder\utils\site\sitecustomize.py", line 102, in execfile exec(compile(f.read(), filename, 'exec'), namespace)

File "C:/Users/hlee/Desktop/pyspark.py", line 38, in sc = SparkContext()

File "C:\spark\python\lib\pyspark.zip\pyspark\context.py", line 115, in init conf, jsc, profiler_cls)

File "C:\spark\python\lib\pyspark.zip\pyspark\context.py", line 168, in _do_init self._jsc = jsc or self._initialize_context(self._conf._jconf)

File "C:\spark\python\lib\pyspark.zip\pyspark\context.py", line 233, in _initialize_context return self._jvm.JavaSparkContext(jconf)

File "C:\spark\python\lib\py4j-0.10.3-src.zip\py4j\java_gateway.py", line 1401, in call answer, self._gateway_client, None, self._fqn)

File "C:\spark\python\lib\py4j-0.10.3-src.zip\py4j\protocol.py", line 319, in get_return_value format(target_id, ".", name), value)

Py4JJavaError: An error occurred while calling None.org.apache.spark.api.java.JavaSparkContext. : java.net.BindException: Cannot assign requested address: bind: Service 'sparkDriver' failed after 16 retries! Consider explicitly setting the appropriate port for the service 'sparkDriver' (for example spark.ui.port for SparkUI) to an available port or increasing spark.port.maxRetries. at sun.nio.ch.Net.bind0(Native Method) at sun.nio.ch.Net.bind(Net.java:433) at sun.nio.ch.Net.bind(Net.java:425) at sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:223) at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74) at io.netty.channel.socket.nio.NioServerSocketChannel.doBind(NioServerSocketChannel.java:125) at io.netty.channel.AbstractChannel$AbstractUnsafe.bind(AbstractChannel.java:485) at io.netty.channel.DefaultChannelPipeline$HeadContext.bind(DefaultChannelPipeline.java:1089) at io.netty.channel.AbstractChannelHandlerContext.invokeBind(AbstractChannelHandlerContext.java:430) at io.netty.channel.AbstractChannelHandlerContext.bind(AbstractChannelHandlerContext.java:415) at io.netty.channel.DefaultChannelPipeline.bind(DefaultChannelPipeline.java:903) at io.netty.channel.AbstractChannel.bind(AbstractChannel.java:198) at io.netty.bootstrap.AbstractBootstrap$2.run(AbstractBootstrap.java:348) at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:357) at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:357) at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111) at java.lang.Thread.run(Thread.java:745)

Following is my sample code, and I have no problem running Apache in cmd.

import os
import sys
spark_path = r"C:\spark"
os.environ['SPARK_HOME'] = spark_path
sys.path.insert(0, spark_path + "/bin")
sys.path.insert(0, spark_path + "/python/pyspark/")
sys.path.insert(0, spark_path + "/python/lib/pyspark.zip")
sys.path.insert(0, spark_path + "/python/lib/py4j-0.10.3-src.zip")

from pyspark import SparkContext

sc = SparkContext()

import random
NUM_SAMPLES = 100000

def sample(p):
    x, y = random.random(), random.random()
    return 1 if x*x + y*y < 1 else 0

count = sc.parallelize(range(0, NUM_SAMPLES)).map(sample) \
              .reduce(lambda a, b: a + b)
print("Pi is roughly %f" % (4.0 * count / NUM_SAMPLES))

I have downloaded winutils.exe and added HADOOP_HOME variable in environment Varaible and added export SPARK_MASTER_IP=127.0.0.1, export SPARK_LOCAL_IP=127.0.0.1 in spark-env.sh file. However, I am still getting the same error. Can someone help me and point out what am I missing?

Thank you in advance,


回答1:


In my case I just had to re-estart the kernel.

The problem was that I was creating the environment twice: every time I made a mistake I re-ran the code from the beginning.



来源:https://stackoverflow.com/questions/41049330/windows-error-while-running-standalone-pyspark

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!