Python Round Function Issues with pyspark

大憨熊 提交于 2019-12-23 09:59:16

问题


I am relatively new to spark and I've run into an issue when I try to use python's builtin round() function after importing pyspark functions. It seems to have to do with how I import the pyspark functions but I am not sure what the difference is or why one way would cause issues and the other wouldn't.

Expected behavior:

import pyspark.sql.functions
print(round(3.14159265359,2))
>>> 3.14

Unexpected behavior:

from pyspark.sql.functions import *
print(round(3.14159265359,2))
>>> ERROR

AttributeError                            Traceback (most recent call last)
<ipython-input-1-50155ca4fa82> in <module>()
      1 from pyspark.sql.functions import *
----> 2 print(round(3.1454848383,2))

/opt/spark/python/pyspark/sql/functions.py in round(col, scale)
    503     """
    504     sc = SparkContext._active_spark_context
--> 505     return Column(sc._jvm.functions.round(_to_java_column(col), scale))
    506 
    507 

AttributeError: 'NoneType' object has no attribute '_jvm'

回答1:


Import import pyspark.sql.functions as F to avoid conflict.

In this way, you can use all python built-in functions normally and when you want to use pyspark functions, use them as F.round




回答2:


Don't do import * as it can mess up your namespace.

Pyspark has round function: http://spark.apache.org/docs/2.1.0/api/python/pyspark.sql.html#pyspark.sql.functions.round

So build-in function round is being replaced by pyspark.sql.functions.round



来源:https://stackoverflow.com/questions/52557734/python-round-function-issues-with-pyspark

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!