I am working in PySpark on a Jupyter notebook (Python 2.7) in windows 7. I have an RDD of type pyspark.rdd.PipelinedRDD
called idSums
. When attempt
You are missing winutils.exe a hadoop binary . Depending upon x64 bit / x32 bit System download the winutils.exe file & set your hadoop home pointing to it.
1st way :
hadoop
folder in Your System, ex C:
bin
folder in hadoop
directory, ex : C:\hadoop\bin
winutils.exe
in bin
, ex: C:\hadoop\bin\winuitls.exe
Create New Variable
Name: HADOOP_HOME
Path: C:\hadoop\
2nd Way :
You can set hadoop home directly in Your Java Program with the following Command like this :
System.setProperty("hadoop.home.dir","C:\hadoop" );