Hive | 易学教程

Hive: More clean way to SELECT AS and GROUP BY

阅读更多关于 Hive: More clean way to SELECT AS and GROUP BY

问题 I try to write Hive Sql like that SELECT count(1), substr(date, 1, 4) as year FROM *** GROUP BY year But Hive cannot recognize the alias name 'year', it complains that: FAILED: SemanticException [Error 10004]: Line 1:79 Invalid table alias or column reference 'year' One solution(Hive: SELECT AS and GROUP BY) suggest to use 'GROUP BY substr(date, 1, 4)'. It works! However in some cases the value I want to group by may be generated from multiple lines of hive function code , it's very ugly to

Hive: More clean way to SELECT AS and GROUP BY

阅读更多关于 Hive: More clean way to SELECT AS and GROUP BY

Assign same value when using lag function if column used in lag has same value

阅读更多关于 Assign same value when using lag function if column used in lag has same value

问题 I have a table in sql contents are below +---+----------+----------+----------+--------+ | pk| from_d| to_d| load_date| row_num| +---+----------+----------+----------+--------+ |111|2019-03-03|2019-03-03|2019-03-03| 1| |111|2019-02-02|2019-02-02|2019-02-02| 2| |111|2019-02-02|2019-02-02|2019-02-02| 2| |111|2019-01-01|2019-01-01|2019-01-01| 3| |222|2019-03-03|2019-03-03|2019-03-03| 1| |222|2019-01-01|2019-01-01|2019-01-01| 2| |333|2019-02-02|2019-02-02|2019-02-02| 1| |333|2019-01-01|2019-01-01

In Spark streaming, Is it possible to upsert batch data from kafka to Hive?

阅读更多关于 In Spark streaming, Is it possible to upsert batch data from kafka to Hive?

问题 My plan is: 1. using spark streaming to load data from kafka every period like 1 minute. 2. convert the data loading every 1 min into DataFrame. 3. upsert the DataFrame into a Hive table (a table storing all history data) Currently, I successfully implemented the step1-2. And I want to know if there is any practical way to realize the step3. In detail: 1. load the latest history table with a certain partition in spark streaming. 2. use batch DataFrame to join the history table/DataFrame with

How to resolve com.mongodb.spark.exceptions.MongoTypeConversionException: Cannot cast… Java Spark

阅读更多关于 How to resolve com.mongodb.spark.exceptions.MongoTypeConversionException: Cannot cast… Java Spark

问题 Hi I am new to Java Spark, and have been looking for solutions for couple of days. I am working on loading MongoDB data into hive table, however, I found some error while saveAsTable that occurs this error com.mongodb.spark.exceptions.MongoTypeConversionException: Cannot cast STRING into a StructType(StructField(oid,StringType,true)) (value: BsonString{value='54d3e8aeda556106feba7fa2'}) I've tried increase the sampleSize, different mongo-spark-connector versions, ... but non of working

AWS Athena null values are replaced by N after table is created. How to keep them as it is?

阅读更多关于 AWS Athena null values are replaced by N after table is created. How to keep them as it is?

问题 I'm creating a table in Athena from csv data in S3. The data has some columns quoted, so I use: ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde' WITH SERDEPROPERTIES ( "separatorChar" = ",", 'serialization.null.format' = '') The serde works fine but then the null values in the resultant table are replaced with N. How can I keep the null values as empty or like Null etc, but not as N. Thanks. 来源： https://stackoverflow.com/questions/61020631/aws-athena-null-values-are-replaced-by-n

SaveAsTable in Spark Scala: HDP3.x

阅读更多关于 SaveAsTable in Spark Scala: HDP3.x

问题 I have one dataframe in Spark I'm saving it in my hive as a table.But getting below error message. java.lang.RuntimeException: com.hortonworks.spark.sql.hive.llap.HiveWarehouseConnector does not allow create table as select.at scala.sys.package$.error(package.scala:27) can anyone please help me how should i save this as table in hive. val df3 = df1.join(df2, df1("inv_num") === df2("inv_num") // Join both dataframes on id column ).withColumn("finalSalary", when(df1("salary") < df2("salary"),

Creating and using Spark-Hive UDF for Date

阅读更多关于 Creating and using Spark-Hive UDF for Date

问题 Note: This Quetion is Linked from this Question:Creting UDF function with NonPrimitive Data Type and using in Spark-sql Query: Scala Hi I have Craeted one method in scala. package test.udf.demo object UDF_Class { def transformDate( dateColumn: String, df: DataFrame) : DataFrame = { val sparksession = SparkSession.builder().appName("App").getOrCreate() val d=df.withColumn("calculatedCol", month(to_date(from_unixtime(unix_timestamp(col(dateColumn), "dd-MM-yyyy"))))) df.withColumn("date1", when

hive table gives error Unimplemented type

阅读更多关于 hive table gives error Unimplemented type

问题 Using spark-sql-2.4.1, and writing a parquet file with schema containing |-- avg: double (nullable = true) While reading the same using val df = spark.read.format("parquet").load(); Getting error: UnsupportedOperationException: Unimplemented type: DoubleType. So what is wrong here, and how to fix this? Stack Trace: Caused by: java.lang.UnsupportedOperationException: Unimplemented type: DoubleType at org.apache.spark.sql.execution.datasources.parquet.VectorizedColumnReader.readIntBatch

Why am I getting negative allocated mappers in Tez job? Vertex failure?

阅读更多关于 Why am I getting negative allocated mappers in Tez job? Vertex failure?

问题 I'm trying to use the PhoenixStorageHandler as documented here, and populate it with the following query in beeline shell: insert into table pheonix_table select * from hive_table; I get the following breakdown of the mappers in the Tez session: ... INFO : Map 1: 0(+50)/50 INFO : Map 1: 0(+50)/50 INFO : Map 1: 0(+50,-2)/50 INFO : Map 1: 0(+50,-3)/50 ... before the session crashes with a very long error message (422 lines) about vertex failure: Error: Error while processing statement: FAILED: