Hive | 易学教程

[Hive]I got “ArrayIndexOutOfBoundsException” while I query the hive database

阅读更多关于 [Hive]I got “ArrayIndexOutOfBoundsException” while I query the hive database

问题 I always get "ArrayIndexOutOfBoundsException" while I query the hive base(both hive-0.11.0 and hive-0.12.0), but sometimes not. Here is the error java.lang.RuntimeException: Hive Runtime Error while closing operators: java.lang.ArrayIndexOutOfBoundsException: 0 at org.apache.hadoop.hive.ql.exec.mr.ExecReducer.close(ExecReducer.java:313) at org.apache.hadoop.io.IOUtils.cleanup(IOUtils.java:232) at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:539) at org.apache.hadoop

How to let null values are not stored in HBase in Pandas Python?

阅读更多关于 How to let null values are not stored in HBase in Pandas Python?

问题 I have some sample data as below: test_a test_b test_c test_d test_date ------------------------------------------------- 1 a 500 0.1 111 20191101 2 a NaN 0.2 NaN 20191102 3 a 200 0.1 111 20191103 4 a 400 NaN 222 20191104 5 a NaN 0.2 333 20191105 I would like to let those data store in Hbase, and I use the below code to achieve it. from test.db import impala, hbasecon, HiveClient import pandas as pd sql = """ SELECT test_a ,test_b ,test_c ,test_d ,test_date FROM table_test """ conn_impa =

Spark/Scala load Oracle Table to Hive

阅读更多关于 Spark/Scala load Oracle Table to Hive

问题 I am loading few Oracle tables to Hive, it seems to be working but 2 tables are getting error - IllegalArgumentException: requirement failed: Decimal precision 136 exceeds max precision 38 I checked Oracle table and there is no column with Decimal (136) precision, in the source. Here is the Spark/Scala code in spark-shell : val df_oracle = spark.read.format("jdbc").option("url", "jdbc:oracle:thin:@hostname:port:SID").option("user",userName).option("password",passWord).option("driver", "oracle

Getting Error 10293 while inserting a row to a hive table having array as one of the fileds

阅读更多关于 Getting Error 10293 while inserting a row to a hive table having array as one of the fileds

问题 I have a hive table created using the following query: create table arraytbl (id string, model string, cost int, colors array <string>,size array <float>) row format delimited fields terminated by ',' collection items terminated by '#'; Now , while trying to insert a row: insert into mobilephones values ("AA","AAA",5600,colors("red","blue","green"),size(5.6,4.3)); I get the following error: FAILED: SemanticException [Error 10293]: Unable to create temp file for insert values Expression of

Install Hive on windows: 'hive' is not recognized as an internal or external command, operable program or batch file

阅读更多关于 Install Hive on windows: 'hive' is not recognized as an internal or external command, operable program or batch file

问题 I have installed Hadoop 2.7.3 on Windows and I am able to start the cluster. Now I would like to have hive and went through the steps below: 1. Downloaded db-derby-10.12.1.1-bin.zip, unpacked it and started the startNetworkServer -h 0.0.0.0. 2. Downloaded apache-hive-1.1.1-bin.tar.gz from mirror site and unpacked it. Created hive-site.xml to have below properties: javax.jdo.option.ConnectionURL javax.jdo.option.ConnectionDriverName hive.server2.enable.impersonation hive.server2.authentication

Install Hive on windows: 'hive' is not recognized as an internal or external command, operable program or batch file

阅读更多关于 Install Hive on windows: 'hive' is not recognized as an internal or external command, operable program or batch file

亿级数据，秒级响应，Smartbi究竟如何做到？

阅读更多关于亿级数据，秒级响应，Smartbi究竟如何做到？

关于 Smartbi，似乎有很多标签：真Excel、复杂报表、性能、自助分析、数据挖掘、NLP….其中，一个“性能”标签，江湖上就有很多的传说，例如应用于火星探测器飞行数据的分析，应用于某省的经济普查，应用于某银行的大规模数据挖掘等等。数据处理的性能，对于一款 BI软件来说，是最基本的要求。然而，恰恰最基本的要求，却最能体现产品的品质，使其在众多竞品中脱颖而出。那么， Smartbi又是如何做到数据处理性能如此强悍呢？一、支持列式数据库传统行式数据库的存储格式按照 ‘行’的方式把一行各个字段的数据存储在一起，一行行连续存储。对于把一行的数据写到数据库中，或者对一行数据中的某些字段进行修改，或者删除整行数据这些事务型的数据库操作来说，既直观也高效。但是，在行式数据库上做统计分析的时候，这种存储格式效率并不高。例如：统计各地区的销售额和利润同比变化、统计各部门的业绩完成情况等等，都是在其中某些字段上的操作，但行式数据库却需要读取每一行的所有字段。在只分析销售额和利润的时候，把其它字段的数据如客户名称，签约时间，客户经理等等也统统都读了进来，浪费了大量资源。虽然通过 “索引”有一定的改善，但大量的索引所带来的存储空间浪费以及为维护这些索引所带来的时间浪费都会以指数级别增长。图源：网络列式数据库将同一个数据 “列”的各个值存放在一起，插入某一行数据时

Creting UDF function with NonPrimitive Data Type and using in Spark-sql Query: Scala

阅读更多关于 Creting UDF function with NonPrimitive Data Type and using in Spark-sql Query: Scala

问题 I am creating one function in scala which i want to use in my spark-sql query.my query is working fine in hive or if i am giving the same query in spark sql but the same query i'm using at multiple places so i want to create it as reusable function/method so whenever its required i can just call it. I have created below function in my scala class. def date_part(date_column:Column) = { val m1: Column = month(to_date(from_unixtime(unix_timestamp(date_column, "dd-MM-yyyy")))) //give value as 01

Hive - How to cast array to string?

阅读更多关于 Hive - How to cast array to string?

问题 I'm trying to coerce a column containing a comma separated array to a string in Hive. SELECT email_address, CAST(explode(GP_array AS STRING)) AS GP FROM dm.TP i get the following error Line: 1 - FAILED: SemanticException [Error 10081]: UDTF's are not supported outside the SELECT clause, nor nested in expressions 回答1: explode function Explodes an array to multiple rows. Returns a row-set with a single column (col), one row for each element from the array. you would need concat_ws function to

extracting a substring from a text column in hive

阅读更多关于 extracting a substring from a text column in hive

问题 We have text data in a column named title like below "id":"S-1-98-13474422323-33566802","name":"uid=Xzdpr0,ou=people,dc=vm,dc=com","shortName":"XZDPR0","displayName":"Jund Lee","emailAddress":"jund.lee@bm.com","title":"Leading Product Investor" Need to extract just the display name (Jund lee in this example) from the above text data in hive, I have tried using substring function but don't seem to work,Please help 回答1: Use regexp_extract function with the matching regex to capture only the