how to add a Incremental column ID for a table in spark SQL

混江龙づ霸主 提交于 2019-12-07 04:58:39

问题


I'm working on a spark mllib algorithm. The dataset I have is in this form

Company":"XXXX","CurrentTitle":"XYZ","Edu_Title":"ABC","Exp_mnth":.(there are more values similar to these)

Im trying to raw code String values to Numeric values. So, I tried using zipwithuniqueID for unique value for each of the string values.For some reason I'm not able to save the modified dataset to the disk. Can I do this in any way using spark SQL? or what would be the better approach for this?


回答1:


Scala

val dataFrame1 = dataFrame0.withColumn("index",monotonically_increasing_id())

Java

 Import org.apache.spark.sql.functions;
Dataset<Row> dataFrame1 = dataFrame0.withColumn("index",functions.monotonically_increasing_id());


来源:https://stackoverflow.com/questions/38377101/how-to-add-a-incremental-column-id-for-a-table-in-spark-sql

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!