Spark Streaming reach dataframe columns and add new column looking up to Redis

跟風遠走 提交于 2021-01-01 17:49:09

问题


In my previous question(Spark Structured Streaming dynamic lookup with Redis ) , i succeeded to reach redis with mapparttions thanks to https://stackoverflow.com/users/689676/fe2s

I tried to use mappartitions but i could not solve one point, how i can reach per row column in the below code part while iterating. Because i want to enrich my per-row against my lookup fields kept in Redis. I found something like this, but how i can reach dataframe columns and add new column looking up to Redis. for any help i really much appreciate, Thanks.

import org.apache.spark.sql.types._

def transformRow(row: Row): Row =  {
    Row.fromSeq(row.toSeq ++ Array[Any]("val1", "val2"))
}

def transformRows(iter: Iterator[Row]): Iterator[Row] =
{ 
    val redisConn =new RedisClient("xxx.xxx.xx.xxx",6379,1,Option("Secret123"))    
    println(redisConn.get("ModelValidityPeriodName").getOrElse("")) 
    //want to  reach  DataFrame column here   
    redisConn.close()
    iter.map(transformRow)     
}

val newSchema = StructType(raw_customer_df.schema.fields ++ 
    Array(
            StructField("ModelValidityPeriod", StringType, false), 
            StructField("ModelValidityPeriod2", StringType, false)
        )
  )

spark.sqlContext.createDataFrame(raw_customer_df.rdd.mapPartitions(transformRows), newSchema).show

回答1:


Iterator iter represents an iterator over the dataframe rows. So if I got your question correctly, you can access column values by iterative over iter and calling

row.getAs[Column_Type](column_name)

Something like this

def transformRows(iter: Iterator[Row]): Iterator[Row] = {
    val redisConn = new RedisClient("xxx.xxx.xx.xxx",6379,1,Option("Secret123"))
    println(redisConn.get("ModelValidityPeriodName").getOrElse(""))
    //want to  reach  DataFrame column here
    val res = iter.map { row =>
      val columnValue = row.getAs[String]("column_name")
      // lookup in redis
      val valueFromRedis = redisConn.get(...)
      Row.fromSeq(row.toSeq ++ Array[Any](valueFromRedis))
    }.toList

    redisConn.close()
    res.iterator
  }


来源:https://stackoverflow.com/questions/65240504/spark-streaming-reach-dataframe-columns-and-add-new-column-looking-up-to-redis

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!