pyspark dataframe withColumn command not working

后端 未结 3 1037
独厮守ぢ
独厮守ぢ 2021-01-16 17:29

I have a input dataframe: df_input (updated df_input)

|comment|inp_col|inp_val|
|11     |a      |a1     |
|12     |a      |a2     |
         


        
3条回答
  •  失恋的感觉
    2021-01-16 18:01

    Can you tryout this solution. Your approach may run into whole lot of problems.

    import pyspark.sql.functions as F
    from pyspark.sql.functions import col
    from pyspark.sql.window import Window
    #Test data
    tst = sqlContext.createDataFrame([(1,'a','3'),(1,'a','4'),(1,'b','5'),(1,'b','7'),(2,'c','&b'),(2,'c','&a'),(2,'d','&b')],schema=['col1','col2','col3'])
    # extract the special character out
    tst_1 = tst.withColumn("col3_extract",F.substring(F.col('col3'),2,1))
    # Selecct which values need to be replaced; withColumnRenamed will also solve spark self join issues
    # The substring search can also be done using regex function
    tst_filter=tst.where(~F.col('col3').contains('&')).withColumnRenamed('col2','col2_collect')
    # For the selected data, perform a collect list
    tst_clct = tst_filter.groupby('col2_collect').agg(F.collect_list('col3').alias('col3_collect'))
    #%% Join the main table with the collected list
    tst_join = tst_1.join(tst_clct,on=tst_1.col3_extract==tst_clct.col2_collect,how='left').drop('col2_collect')
    #%% In the column3  replace the values such as a, b
    tst_result = tst_join.withColumn("result",F.when(~F.col('col3').contains('&'),F.array(F.col('col3'))).otherwise(F.col('col3_collect')))
    

    Results :

    +----+----+----+------------+------------+------+
    |col1|col2|col3|col3_extract|col3_collect|result|
    +----+----+----+------------+------------+------+
    |   2|   c|  &a|           a|      [3, 4]|[3, 4]|
    |   2|   c|  &b|           b|      [7, 5]|[7, 5]|
    |   2|   d|  &b|           b|      [7, 5]|[7, 5]|
    |   1|   a|   3|            |        null|   [3]|
    |   1|   a|   4|            |        null|   [4]|
    |   1|   b|   5|            |        null|   [5]|
    |   1|   b|   7|            |        null|   [7]|
    +----+----+----+------------+------------+------+
    

提交回复
热议问题