select multiple elements with group by in spark.sql

情到浓时终转凉″ 提交于 2019-12-23 03:47:20

问题


is there any way to group by table in sql spark which selects multiple elements code i am using:

val df = spark.read.json("//path")
df.createOrReplaceTempView("GETBYID")

now doing group by like :

val sqlDF = spark.sql(
  "SELECT count(customerId) FROM GETBYID group by customerId");

but when I try:

val sqlDF = spark.sql(
  "SELECT count(customerId),customerId,userId FROM GETBYID group by customerId");

Spark gives an error :

org.apache.spark.sql.AnalysisException: expression 'getbyid.userId' is neither present in the group by, nor is it an aggregate function. Add to group by or wrap in first() (or first_value) if you don't care which value you get.;

is there any possible way to do that


回答1:


Yes, it's possible and the error message you attached describes all the possibilities. You can either add the userId to groupBy:

val sqlDF = spark.sql("SELECT count(customerId),customerId,userId FROM GETBYID group by customerId, userId");

or use first():

val sqlDF = spark.sql("SELECT count(customerId),customerId,first(userId) FROM GETBYID group by customerId");



回答2:


And if you want to keep all the occurences of userId, you can do this :

spark.sql("SELECT count(customerId), customerId, collect_list(userId) FROM GETBYID group by customerId")

By using collect_list.



来源:https://stackoverflow.com/questions/41421675/select-multiple-elements-with-group-by-in-spark-sql

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!