how to add a Incremental column ID for a table in spark SQL

问题

I'm working on a spark mllib algorithm. The dataset I have is in this form

Company":"XXXX","CurrentTitle":"XYZ","Edu_Title":"ABC","Exp_mnth":.(there are more values similar to these)

Im trying to raw code String values to Numeric values. So, I tried using zipwithuniqueID for unique value for each of the string values.For some reason I'm not able to save the modified dataset to the disk. Can I do this in any way using spark SQL? or what would be the better approach for this?

回答1:

Scala

val dataFrame1 = dataFrame0.withColumn("index",monotonically_increasing_id())

Java

 Import org.apache.spark.sql.functions;
Dataset<Row> dataFrame1 = dataFrame0.withColumn("index",functions.monotonically_increasing_id());

来源：https://stackoverflow.com/questions/38377101/how-to-add-a-incremental-column-id-for-a-table-in-spark-sql

标签

apache-spark

apache-spark-sql

spark-dataframe

apache-spark-mllib

易学教程内所有资源均来自网络或用户发布的内容，如有违反法律规定的内容欢迎反馈！
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!