Can I add arguments to python code when I submit spark job?

后端 未结 5 1397
予麋鹿
予麋鹿 2020-12-28 13:24

I\'m trying to use spark-submit to execute my python code in spark cluster.

Generally we run spark-submit with python code like below.

5条回答
  •  心在旅途
    2020-12-28 13:51

    You can pass the arguments from the spark-submit command and then access them in your code in the following way,

    sys.argv[1] will get you the first argument, sys.argv[2] the second argument and so on. Refer to the below example,

    You can create code as below to take the arguments which you will be passing in the spark-submit command,

    import os
    import sys
    
    n = int(sys.argv[1])
    a = 2
    tables = []
    for _ in range(n):
        tables.append(sys.argv[a])
        a += 1
    print(tables)
    

    Save the above file as PysparkArg.py and execute the below spark-submit command,

    spark-submit PysparkArg.py 3 table1 table2 table3
    

    Output:

    ['table1', 'table2', 'table3']
    

    This piece of code can be used in PySpark jobs where it is required to fetch multiple tables from the database and, the number of tables to be fetched & the table names will be given by the user while executing the spark-submit command.

提交回复
热议问题