How do I unit test PySpark programs?

后端 未结 7 1900
你的背包
你的背包 2020-12-12 17:01

My current Java/Spark Unit Test approach works (detailed here) by instantiating a SparkContext using \"local\" and running unit tests using JUnit.

The code has to be

7条回答
  •  小蘑菇
    小蘑菇 (楼主)
    2020-12-12 18:00

    I'd recommend using py.test as well. py.test makes it easy to create re-usable SparkContext test fixtures and use it to write concise test functions. You can also specialize fixtures (to create a StreamingContext for example) and use one or more of them in your tests.

    I wrote a blog post on Medium on this topic:

    https://engblog.nextdoor.com/unit-testing-apache-spark-with-py-test-3b8970dc013b

    Here is a snippet from the post:

    pytestmark = pytest.mark.usefixtures("spark_context")
    def test_do_word_counts(spark_context):
        """ test word couting
        Args:
           spark_context: test fixture SparkContext
        """
        test_input = [
            ' hello spark ',
            ' hello again spark spark'
        ]
    
        input_rdd = spark_context.parallelize(test_input, 1)
        results = wordcount.do_word_counts(input_rdd)
    
        expected_results = {'hello':2, 'spark':3, 'again':1}  
        assert results == expected_results
    

提交回复
热议问题