How to create index in Spark Table?

拟墨画扇 提交于 2019-12-22 11:26:22

问题


I know Spark Sql is almost same as Hive.

Now I have created a table and when I am doing Spark sql query to create the table index, it always gives me this error:

Error in SQL statement: AnalysisException: mismatched input '' expecting AS near ')' in create index statement

The Spark sql query I am using is:

CREATE INDEX word_idx ON TABLE t (id)

The data type of id is bigint. Before this, I have also tried to create table index on "word" column of this table, it gave me the same error.

So, is there anyway to create index through Spark sql query?


回答1:


There's no way to do this through a Spark SQL query, really. But there's an RDD function called zipWithIndex. You can convert the DataFrame to an RDD, do zipWithIndex, and convert the resulting RDD back to a DataFrame.

See this community Wiki article for a full-blown solution.

Another approach could be to use the Spark MLLib String Indexer.



来源:https://stackoverflow.com/questions/36110701/how-to-create-index-in-spark-table

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!