Windows error while running standalone pyspark

问题

I am trying to import pyspark in Anaconda and run sample code. However, whenever I try to run the code in Anaconda, I get following error message.

ERROR:py4j.java_gateway:An error occurred while trying to connect to the Java server (127.0.0.1:53294) Traceback (most recent call last): File "C:\spark\python\lib\py4j-0.10.3-src.zip\py4j\java_gateway.py", line 1021, in send_command self.socket.sendall(command.encode("utf-8")) ConnectionResetError: [WinError 10054] An existing connection was forcibly closed by the remote host

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "C:\spark\python\lib\py4j-0.10.3-src.zip\py4j\java_gateway.py", line 883, in send_command response = connection.send_command(command) File "C:\spark\python\lib\py4j-0.10.3-src.zip\py4j\java_gateway.py", line 1025, in send_command "Error while sending", e, proto.ERROR_ON_SEND) py4j.protocol.Py4JNetworkError: Error while sending

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "C:\spark\python\lib\py4j-0.10.3-src.zip\py4j\java_gateway.py", line 827, in _get_connection connection = self.deque.pop() IndexError: pop from an empty deque

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "C:\spark\python\lib\py4j-0.10.3-src.zip\py4j\java_gateway.py", line 963, in start self.socket.connect((self.address, self.port)) ConnectionRefusedError: [WinError 10061] No connection could be made because the target machine actively refused it ERROR:py4j.java_gateway:An error occurred while trying to connect to the Java server (127.0.0.1:53294) Traceback (most recent call last): File "C:\spark\python\lib\py4j-0.10.3-src.zip\py4j\java_gateway.py", line 827, in _get_connection connection = self.deque.pop() IndexError: pop from an empty deque

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "C:\spark\python\lib\py4j-0.10.3-src.zip\py4j\java_gateway.py", line 963, in start self.socket.connect((self.address, self.port)) ConnectionRefusedError: [WinError 10061] No connection could be made because the target machine actively refused it ERROR:py4j.java_gateway:An error occurred while trying to connect to the Java server (127.0.0.1:53294) Traceback (most recent call last): File "C:\spark\python\lib\py4j-0.10.3-src.zip\py4j\java_gateway.py", line 827, in _get_connection connection = self.deque.pop() IndexError: pop from an empty deque

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "C:\spark\python\lib\py4j-0.10.3-src.zip\py4j\java_gateway.py", line 963, in start self.socket.connect((self.address, self.port)) ConnectionRefusedError: [WinError 10061] No connection could be made because the target machine actively refused it ERROR:py4j.java_gateway:An error occurred while trying to connect to the Java server (127.0.0.1:53294) Traceback (most recent call last): File "C:\spark\python\lib\py4j-0.10.3-src.zip\py4j\java_gateway.py", line 827, in _get_connection connection = self.deque.pop() IndexError: pop from an empty deque

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "C:\spark\python\lib\py4j-0.10.3-src.zip\py4j\java_gateway.py", line 963, in start self.socket.connect((self.address, self.port)) ConnectionRefusedError: [WinError 10061] No connection could be made because the target machine actively refused it ERROR:py4j.java_gateway:An error occurred while trying to connect to the Java server (127.0.0.1:53294) Traceback (most recent call last): File "C:\spark\python\lib\py4j-0.10.3-src.zip\py4j\java_gateway.py", line 827, in _get_connection connection = self.deque.pop() IndexError: pop from an empty deque

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "C:\spark\python\lib\py4j-0.10.3-src.zip\py4j\java_gateway.py", line 963, in start self.socket.connect((self.address, self.port)) ConnectionRefusedError: [WinError 10061] No connection could be made because the target machine actively refused it Reloaded modules: py4j.protocol, pyspark.sql.context, py4j.java_gateway, py4j.compat, pyspark.profiler, pyspark.sql.catalog, pyspark.context, pyspark.sql.group, pyspark.sql.conf, pyspark.sql.readwriter, pyspark.resultiterable, pyspark.sql, pyspark.sql.dataframe, pyspark.traceback_utils, pyspark.cloudpickle, pyspark.rddsampler, pyspark.accumulators, pyspark.broadcast, py4j, pyspark.rdd, pyspark.sql.functions, pyspark.java_gateway, pyspark.statcounter, pyspark.conf, pyspark.serializers, pyspark.files, pyspark.join, pyspark.sql.streaming, pyspark.shuffle, pyspark, py4j.version, pyspark.sql.session, pyspark.sql.column, py4j.finalizer, py4j.java_collections, pyspark.status, pyspark.sql.window, pyspark.sql.utils, pyspark.storagelevel, pyspark.heapq3, py4j.signals, pyspark.sql.types Traceback (most recent call last):

File "", line 1, in runfile('C:/Users/hlee/Desktop/pyspark.py', wdir='C:/Users/hlee/Desktop')

File "C:\Program Files\Anaconda3\lib\site-packages\spyder\utils\site\sitecustomize.py", line 866, in runfile execfile(filename, namespace)

File "C:\Program Files\Anaconda3\lib\site-packages\spyder\utils\site\sitecustomize.py", line 102, in execfile exec(compile(f.read(), filename, 'exec'), namespace)

File "C:/Users/hlee/Desktop/pyspark.py", line 38, in sc = SparkContext()

File "C:\spark\python\lib\pyspark.zip\pyspark\context.py", line 115, in init conf, jsc, profiler_cls)

File "C:\spark\python\lib\pyspark.zip\pyspark\context.py", line 168, in _do_init self._jsc = jsc or self._initialize_context(self._conf._jconf)

File "C:\spark\python\lib\pyspark.zip\pyspark\context.py", line 233, in _initialize_context return self._jvm.JavaSparkContext(jconf)

File "C:\spark\python\lib\py4j-0.10.3-src.zip\py4j\java_gateway.py", line 1401, in call answer, self._gateway_client, None, self._fqn)

File "C:\spark\python\lib\py4j-0.10.3-src.zip\py4j\protocol.py", line 319, in get_return_value format(target_id, ".", name), value)

Py4JJavaError: An error occurred while calling None.org.apache.spark.api.java.JavaSparkContext. : java.net.BindException: Cannot assign requested address: bind: Service 'sparkDriver' failed after 16 retries! Consider explicitly setting the appropriate port for the service 'sparkDriver' (for example spark.ui.port for SparkUI) to an available port or increasing spark.port.maxRetries. at sun.nio.ch.Net.bind0(Native Method) at sun.nio.ch.Net.bind(Net.java:433) at sun.nio.ch.Net.bind(Net.java:425) at sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:223) at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74) at io.netty.channel.socket.nio.NioServerSocketChannel.doBind(NioServerSocketChannel.java:125) at io.netty.channel.AbstractChannel$AbstractUnsafe.bind(AbstractChannel.java:485) at io.netty.channel.DefaultChannelPipeline$HeadContext.bind(DefaultChannelPipeline.java:1089) at io.netty.channel.AbstractChannelHandlerContext.invokeBind(AbstractChannelHandlerContext.java:430) at io.netty.channel.AbstractChannelHandlerContext.bind(AbstractChannelHandlerContext.java:415) at io.netty.channel.DefaultChannelPipeline.bind(DefaultChannelPipeline.java:903) at io.netty.channel.AbstractChannel.bind(AbstractChannel.java:198) at io.netty.bootstrap.AbstractBootstrap$2.run(AbstractBootstrap.java:348) at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:357) at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:357) at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111) at java.lang.Thread.run(Thread.java:745)

Following is my sample code, and I have no problem running Apache in cmd.

import os
import sys
spark_path = r"C:\spark"
os.environ['SPARK_HOME'] = spark_path
sys.path.insert(0, spark_path + "/bin")
sys.path.insert(0, spark_path + "/python/pyspark/")
sys.path.insert(0, spark_path + "/python/lib/pyspark.zip")
sys.path.insert(0, spark_path + "/python/lib/py4j-0.10.3-src.zip")

from pyspark import SparkContext

sc = SparkContext()

import random
NUM_SAMPLES = 100000

def sample(p):
    x, y = random.random(), random.random()
    return 1 if x*x + y*y < 1 else 0

count = sc.parallelize(range(0, NUM_SAMPLES)).map(sample) \
              .reduce(lambda a, b: a + b)
print("Pi is roughly %f" % (4.0 * count / NUM_SAMPLES))

I have downloaded winutils.exe and added HADOOP_HOME variable in environment Varaible and added export SPARK_MASTER_IP=127.0.0.1, export SPARK_LOCAL_IP=127.0.0.1 in spark-env.sh file. However, I am still getting the same error. Can someone help me and point out what am I missing?

Thank you in advance,

回答1:

In my case I just had to re-estart the kernel.

The problem was that I was creating the environment twice: every time I made a mistake I re-ran the code from the beginning.

来源：https://stackoverflow.com/questions/41049330/windows-error-while-running-standalone-pyspark

标签

python

apache-spark

localhost