How to set display precision in PySpark Dataframe show

前端 未结 1 1297
日久生厌
日久生厌 2020-12-10 14:34

How do you set the display precision in PySpark when calling .show()?

Consider the following example:



        
相关标签:
1条回答
  • 2020-12-10 15:21

    Round

    The easiest option is to use pyspark.sql.functions.round():

    from pyspark.sql.functions import avg, round
    df.select([round(avg(c), 3).alias(c) for c in df.columns]).show()
    #+------+------+
    #|  col1|  col2|
    #+------+------+
    #|10.099|14.213|
    #+------+------+
    

    This will maintain the values as numeric types.

    Format Number

    The functions are the same for scala and python. The only difference is the import.

    You can use format_number to format a number to desired decimal places as stated in the official api document:

    Formats numeric column x to a format like '#,###,###.##', rounded to d decimal places, and returns the result as a string column.

    from pyspark.sql.functions import avg, format_number 
    df.select([format_number(avg(c), 3).alias(c) for c in df.columns]).show()
    #+------+------+
    #|  col1|  col2|
    #+------+------+
    #|10.099|14.213|
    #+------+------+
    

    The transformed columns would of StringType and a comma is used as a thousands separator:

    #+-----------+--------------+
    #|       col1|          col2|
    #+-----------+--------------+
    #|500,100.000|50,489,590.000|
    #+-----------+--------------+
    

    As stated in the scala version of this answer we can use regexp_replace to replace the , with any string you want

    Replace all substrings of the specified string value that match regexp with rep.

    from pyspark.sql.functions import avg, format_number, regexp_replace
    df.select(
        [regexp_replace(format_number(avg(c), 3), ",", "").alias(c) for c in df.columns]
    ).show()
    #+----------+------------+
    #|      col1|        col2|
    #+----------+------------+
    #|500100.000|50489590.000|
    #+----------+------------+
    
    0 讨论(0)
提交回复
热议问题