Unit test pyspark code using python

前端 未结 4 920
野趣味
野趣味 2020-12-20 20:01

I have script in pyspark like below. I want to unit test a function in this script.

def rename_chars(column_name):
    chars = ((\'         


        
4条回答
  •  無奈伤痛
    2020-12-20 20:08

    Here's one way to do it. In the CLI call:

    python -m unittest my_unit_test_script.py
    

    Code

    import functools
    import unittest
    
    from pyspark import SparkContext, SparkConf
    from pyspark.sql import HiveContext
    
    
    def rename_chars(column_name):
        chars = ((' ', '_&'), ('.', '_$'))
        new_cols = functools.reduce(lambda a, kv: a.replace(*kv), chars, column_name)
        return new_cols
    
    
    def column_names(df):
        changed_col_names = df.schema.names
        for cols in changed_col_names:
            df = df.withColumnRenamed(cols, rename_chars(cols))
        return df
    
    
    class RenameColumnNames(unittest.TestCase):
        def setUp(self):
            conf = SparkConf()
            sc = SparkContext(conf=conf)
            self.sqlContext = HiveContext(sc)
    
        def test_column_names(self):
            cols = ['ID', 'NAME', 'last.name', 'abc test']
            val = [(1, 'Sam', 'SMITH', 'eng'), (2, 'RAM', 'Reddy', 'turbine')]
            df = self.sqlContext.createDataFrame(val, cols)
            result = df.schema.names
            expected = ['ID', 'NAME', 'last_$name', 'abc_&test']
            self.assertEqual(result, expected)
    

提交回复
热议问题