How to measure the execution time of a query on Spark

后端 未结 5 1338
误落风尘
误落风尘 2020-12-01 19:16

I need to measure the execution time of query on Apache spark (Bluemix). What I tried:

import time

startTimeQuery = time.clock()
df = sqlContext.sql(query)
         


        
5条回答
  •  余生分开走
    2020-12-01 19:37

    For those looking for / needing a python version
    (as pyspark google search leads to this post) :

    from time import time
    from datetime import timedelta
    
    class T():
        def __enter__(self):
            self.start = time()
        def __exit__(self, type, value, traceback):
            self.end = time()
            elapsed = self.end - self.start
            print(str(timedelta(seconds=elapsed)))
    

    Usage :

    with T():
        //spark code goes here
    

    As inspired by : https://blog.usejournal.com/how-to-create-your-own-timing-context-manager-in-python-a0e944b48cf8

    Proved useful when using console or whith notebooks (jupyter magic %%time an %timeit are limited to cell scope, which is inconvenient when you have shared objects across notebook context)

提交回复
热议问题