findspark.init() IndexError: list index out of range error

前端 未结 4 989
醉酒成梦
醉酒成梦 2020-12-17 10:11

When running the following in a Python 3.5 Jupyter environment I get the error below. Any ideas on what is causing it?

import findspark
findspark.init()


        
相关标签:
4条回答
  • 2020-12-17 10:19

    maybe this could help:

    i found that findspark.init() tries to find data in .\spark-3.0.1-bin-hadoop2.7\bin\python\lib, but the python folder was outside the bin folder. i simply ran findspark.init('.\spark-3.0.1-bin-hadoop2.7'), without the '\bin' folder

    0 讨论(0)
  • 2020-12-17 10:25

    I was getting the same error and was able to make it work by entering the exact installation directory:

    import findspark
    # Use this
    findspark.init("C:\Users\PolestarEmployee\spark-1.6.3-bin-hadoop2.6")
    # Test
    from pyspark import SparkContext, SparkConf
    

    Basically, it is the directory where spark was extracted. In future where ever you see spark_home enter the same installation directory. I also tried using toree to create a kernal instead, but it is failing somehow. A kernal would be a cleaner solution.

    0 讨论(0)
  • 2020-12-17 10:28

    You need to update the SPARK_HOME variable inside bash_profile. For me, the following command worked(in terminal):

    export SPARK_HOME="/usr/local/Cellar/apache-spark/2.2.0/libexec/"

    After this, you can use follow these commands:

    import findspark
    findspark.init('/usr/local/Cellar/apache-spark/2.2.0/libexec')
    
    0 讨论(0)
  • 2020-12-17 10:34

    This is most likely due to the SPARK_HOME environment variable not being set correctly on your system. Alternatively, you can just specify it when you're initialising findspark, like so:

    import findspark
    findspark.init('/path/to/spark/home')
    

    After that, it should all work!

    0 讨论(0)
提交回复
热议问题