hcatalog

Hadoop - Load Hive tables using PIG

北城以北 提交于 2019-12-13 07:56:04
问题 I want to load Hive tables using Pig. I think we can do this through HCatLoader but I am using xml files to load pig. For this, I have to use XMLLoader . Can I use two options to load XML files in Pig. I am extracting data from XML files using my own UDF and once we extract all the data, I have to load Pig data in Hive tables. I can't use HIVE to extract the XML data as the XML I received is quite complex and I wrote my own UDF to parse the XML. Any suggestions or pointers how we can load

Hive - Varchar vs String , Is there any advantage if the storage format is Parquet file format

寵の児 提交于 2019-12-10 14:48:44
问题 I have a HIVE table which will hold billions of records, its a time-series data so the partition is per minute. Per minute we will have around 1 million records. I have few fields in my table, VIN number (17 chars), Status (2 chars) ... etc So my question is during the table creation if I choose to use Varchar(X) vs String, is there any storage or performance problem, Few limitation of varchar are https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Types#LanguageManualTypes-string

Pig not loading data into HCatalog table - HortonWorks Sandbox [closed]

ぐ巨炮叔叔 提交于 2019-12-10 12:03:37
问题 Closed. This question is off-topic. It is not currently accepting answers. Want to improve this question? Update the question so it's on-topic for Stack Overflow. Closed 3 years ago . I am running a Pig script in the HortonWorks virtual machine with the goal of extracting certain parts of my XML dataset, and loading those parts into columns in an HCatalog table. On my local machine, I run my Pig script on the XML file and get an output file with all the extracted parts. However, for some

Hiveserver2配置及优化

房东的猫 提交于 2019-12-06 16:37:59
先简单介绍下HS1和HS2的主要区别: HiveServer1: 可以看到HS的进程和MetaStore的进程是在一个虚拟机里面的,而且从图中可以看出,一个HS服务同时只能提供一个访问连接。 HiveServer2: HS2的主要改进是把MetaStoreServer从Hiveserver中剥离出来了,形成一个单独的进程运行,而且hiveserver和metastore server可以同时服务 于多个客户端(Beeline CLI,Hive CLI,HCatalog等)。 配置Hiveserver2的访问协议,http或者tcp <property> <name>hive.server2.transport.mode</name> <value>binary</value> <description>Server transport mode. "binary" or "http" .</description> </property> 对应http协议的访问端口 <property> <name>hive.server2.thrift.http.port</name> <value> 10001 </value> <description>Port number when in HTTP mode.</description> </property> 对应tcp协议的访问端口

【干货】Apache Hive 2.1.1 安装配置超详细过程,配置hive、beeline、hwi、HCatalog、WebHCat等组件

拥有回忆 提交于 2019-12-05 23:02:36
在Docker环境成功搭建了Apache Hadoop 2.8 分布式集群,并实现了NameNode HA、ResourceManager HA之后(详见我的另一篇博文: Apache Hadoop 2.8分布式集群详细搭建过程 ),接下来将搭建最新稳定版的Apache Hive 2.1.1,方便日常在自己电脑上测试hive配置和作业,同样的配置也可以应用于服务器上。以下是Apache Hive 2.1.1的安装配置详细过程 1、阅读Apache Hive官网说明文档,下载最新版本Hive Hive是一个基于Hadoop的数据仓库工具,将HDFS中的结构化数据映射为数据表,并实现将类SQL脚本转换为MapReduce作业,从而实现用户只需像传统关系型数据库提供SQL语句,并能实现对Hadoop数据的分析和处理,门槛低,非常适合传统的基于关系型数据库的数据分析向基于Hadoop的分析进行转变。因此,Hive是Hadoop生态圈非常重要的一个工具。 安装配置Apache Hive,最直接的方式,便是阅读 Apache Hive官网的说明文档 ,能了解到很多有用的信息。Apache Hive 要求JDK 1.7及以上,Hadoop 2.x(从Hive 2.0.0开始便不再支持Hadoop 1.x),Hive 可部署于Linux、Mac、Windows环境。 从官网下载最新稳定版本的

Hive error: parseexception missing EOF

旧街凉风 提交于 2019-12-05 16:54:34
问题 I am not sure what I am doing wrong here: hive> CREATE TABLE default.testtbl(int1 INT,string1 STRING) stored as orc tblproperties ("orc.compress"="NONE") LOCATION "/user/hive/test_table"; FAILED: ParseException line 1:107 missing EOF at 'LOCATION' near ')' while the following query works perfectly fine: hive> CREATE TABLE default.testtbl(int1 INT,string1 STRING) stored as orc tblproperties ("orc.compress"="NONE"); OK Time taken: 0.106 seconds Am I missing something here. Any pointers will

Type conversion pig hcatalog

杀马特。学长 韩版系。学妹 提交于 2019-12-05 07:22:17
I use HCatalog version 0.4. I have a table in hive 'abc' which has a column with datatype 'timestamp'. When i try to run a pig script like this "raw_data = load 'abc' using org.apache.hcatalog.pig.HCatLoader();" i get an error saying "java.lang.TypeNotPresentException: Type timestamp not present". The problem is that hcatalog doesn’t support timestamp type. It will be supported under hive 0.13, they have an issue about this problem that was already solved, you can see the issue in https://issues.apache.org/jira/browse/HIVE-5814 If you use Hive-Hcatalog 0.13.0 check path to HCatLoader, you must

What is use of hcatalog in hadoop?

僤鯓⒐⒋嵵緔 提交于 2019-12-03 18:51:42
问题 I'm new to Hadoop. I know that the HCatalog is a table and storage management layer for Hadoop. But how exactly it works and how to use it. Please give some simple example. 回答1: HCatalog supports reading and writing files in any format for which a Hive SerDe (serializer-deserializer) can be written. By default, HCatalog supports RCFile, CSV, JSON, and SequenceFile formats. To use a custom format, you must provide the InputFormat, OutputFormat, and SerDe. HCatalog is built on top of the Hive

is there any metadata store like 'hive metastore' in BigQuery?

主宰稳场 提交于 2019-11-30 18:32:29
问题 I am new to BigQuery. I just want to know, whether do we have anything like hive metastore (metadata about all tables, columns and their description) in BigQuery? 回答1: BigQuery offers some special tables whose contents represent metadata, such as the list of tables and views in a dataset. The "meta-tables" are read-only. To access metadata about the tables and views in a dataset, use the __TABLES_SUMMARY__ meta-table in a query's SELECT statement. You can run the query using the BigQuery web

What is use of hcatalog in hadoop?

◇◆丶佛笑我妖孽 提交于 2019-11-30 01:44:47
I'm new to hadoop.I know that the HCatalog is a table and storage management layer for Hadoop. But how exactly it works & how to use it. Please give some simple example. Mayank Agarwal HCatalog supports reading and writing files in any format for which a Hive SerDe (serializer-deserializer) can be written. By default, HCatalog supports RCFile, CSV, JSON, and SequenceFile formats. To use a custom format, you must provide the InputFormat, OutputFormat, and SerDe. HCatalog is built on top of the Hive metastore and incorporates components from the Hive DDL. HCatalog provides read and write