Spark Structured Streaming and Spark-Ml Regression

给你一囗甜甜゛ 提交于 2019-11-28 02:22:17
user8371915

Today (Spark 2.2 / 2.3) there is no support for machine learning in Structured Streaming and there is no ongoing work in this direction. Please follow SPARK-16424 to track future progress.

You can however:

  • Train iterative, non-distributed models using forEach sink and some form of external state storage. At a high level regression model could be implemented like this:

    • Fetch latest model when calling ForeachWriter.open and initialize loss accumulator (not in a Spark sense, just local variable) for the partition.
    • Compute loss for each record in ForeachWriter.process and update accumulator.
    • Push loses to external store when calling ForeachWriter.close.
    • This would leave external storage in charge with computing gradient and updating model with implementation dependent on the store.
  • Try to hack SQL queries (see https://github.com/holdenk/spark-structured-streaming-ml by Holden Karau)

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!