Iterate Spark dataframe specific column

淺唱寂寞╮ 提交于 2019-12-25 02:28:37

问题


I want to encrypt a few columns of a Spark dataframe based on some condition. The below encrypt and decrypt function is working fine:

def EncryptDecrypt(Encrypt, str):
    key = b'B5oRyf5Zs3P7atXIf-I5TaCeF3aM1NEILv3A7Zm93b4='
    cipher_suite = Fernet(key)
    if Encrypt is True:
        a = bytes(str, "utf-8")
        return cipher_suite.encrypt(bytes(a))
    else:
        return cipher_suite.decrypt(str)

Now, I want to iterate over specific dataframe column to encrypt it. If the encryption condition is satisfied, I have to iterate over that dataframe column.

if sqldf.filter(condition satistified).count() > 0:
    iterate over that specific column to encrypt its data

I have to maintain dataframe column positions so can't add encrypted column at the end.

Please help me to iterate over dataframe rows and let me know if there is any other more optimize approach.


Below is the approach I am using (Edits)-

I am trying to call udf through spark sql but getting a = bytes(str, "utf-8") TypeError: encoding without a string argument error. Below code I am using to register udf and executing it using spark sql

spark.udf.register("my_udf", EncryptDecrypt, ByteType())
sqldf1 = spark.sql("Select " + my_udf(True, " + column + ") from df1")

column is the filed name.

来源:https://stackoverflow.com/questions/54033132/iterate-spark-dataframe-specific-column

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!