How do I unit test PySpark programs?

后端 未结 7 1911
你的背包
你的背包 2020-12-12 17:01

My current Java/Spark Unit Test approach works (detailed here) by instantiating a SparkContext using \"local\" and running unit tests using JUnit.

The code has to be

7条回答
  •  长情又很酷
    2020-12-12 17:49

    Sometime ago I've also faced the same issue and after reading through several articles, forums and some StackOverflow answers I've ended with writing a small plugin for pytest: pytest-spark

    I am already using it for few months and the general workflow looks good on Linux:

    1. Install Apache Spark (setup JVM + unpack Spark's distribution to some directory)
    2. Install "pytest" + plugin "pytest-spark"
    3. Create "pytest.ini" in your project directory and specify Spark location there.
    4. Run your tests by pytest as usual.
    5. Optionally you can use fixture "spark_context" in your tests which is provided by plugin - it tries to minimize Spark's logs in the output.

提交回复
热议问题