How do you set the display precision in PySpark when calling .show()
?
Consider the following example:
The easiest option is to use pyspark.sql.functions.round():
from pyspark.sql.functions import avg, round
df.select([round(avg(c), 3).alias(c) for c in df.columns]).show()
#+------+------+
#| col1| col2|
#+------+------+
#|10.099|14.213|
#+------+------+
This will maintain the values as numeric types.
The functions are the same for scala and python. The only difference is the import
.
You can use format_number to format a number to desired decimal places as stated in the official api document:
Formats numeric column x to a format like '#,###,###.##', rounded to d decimal places, and returns the result as a string column.
from pyspark.sql.functions import avg, format_number
df.select([format_number(avg(c), 3).alias(c) for c in df.columns]).show()
#+------+------+
#| col1| col2|
#+------+------+
#|10.099|14.213|
#+------+------+
The transformed columns would of StringType
and a comma is used as a thousands separator:
#+-----------+--------------+
#| col1| col2|
#+-----------+--------------+
#|500,100.000|50,489,590.000|
#+-----------+--------------+
As stated in the scala version of this answer we can use regexp_replace to replace the ,
with any string you want
Replace all substrings of the specified string value that match regexp with rep.
from pyspark.sql.functions import avg, format_number, regexp_replace
df.select(
[regexp_replace(format_number(avg(c), 3), ",", "").alias(c) for c in df.columns]
).show()
#+----------+------------+
#| col1| col2|
#+----------+------------+
#|500100.000|50489590.000|
#+----------+------------+