hcatalog

Hive Row Formats&SerDe

时光怂恿深爱的人放手 提交于 2020-04-29 13:45:53
Serde是 Serializer/Deserializer的简写。hive使用Serde进行行对象的序列与反序列化。 What is a SerDe? SerDe is a short name for " Serializer and Deserializer. " Hive uses SerDe (and FileFormat) to read and write table rows. HDFS files --> InputFileFormat --> <key, value> --> Deserializer --> Row object Row object --> Serializer --> <key, value> --> OutputFileFormat --> HDFS files 当是读取hdfs文件时key部分将会被忽略,在写入hdfs时key总是一个常量,一般的行的数据是存储在value中的。 用户在建表的时候可以自定义 SerDe 或者使用自带的 SerDe。如果没有指定 ROW FORMAT 或者 ROW FORMAT DELIMITED,将会使用自带的 SerDe。 你可以创建表时使用用户自定义的Serde或者native Serde,如果 ROW FORMAT没有指定或者指定了 ROW FORMAT DELIMITED就会使用native

Hive 0.13 external table dynamic partitioning custom pattern

╄→尐↘猪︶ㄣ 提交于 2020-01-06 08:37:30
问题 According to the documentation, you should be able to specify a custom pattern for a partition Hive external tables partitions. However, I can't get it to work: select * from rawlog_test7 limit 10; returns no records. This is what I am doing set hcat.dynamic.partitioning.custom.pattern="${year}/${month}/${day}/${hour}" I create my table with ... partitioned by (year int, month int, day int, hour int) location '/history.eu1/ed_reports/hourly/'; and my directory structure is ../2014/06/18/13/ .

ERROR 2998: Unhandled internal error. Found interface org.apache.hadoop.mapreduce.JobContext, but class was expected

扶醉桌前 提交于 2020-01-02 12:25:15
问题 I am new to hadoop. I was trying to integrate PIG with hive using Hcatalog but getting the below error during dump. Please let me know if any of you can help me out: A = load 'logs' using org.apache.hcatalog.pig.HCatLoader(); dump A ERROR org.apache.pig.tools.grunt.Grunt - ERROR 2998: Unhandled internal error. Found interface org.apache.hadoop.mapreduce.JobContext, but class was expected load and describe work fines but dump gives above error Details: hadoop-2.6.0 pig-0.14.0 hive-0.12.0

ERROR 2998: Unhandled internal error. Found interface org.apache.hadoop.mapreduce.JobContext, but class was expected

非 Y 不嫁゛ 提交于 2020-01-02 12:24:49
问题 I am new to hadoop. I was trying to integrate PIG with hive using Hcatalog but getting the below error during dump. Please let me know if any of you can help me out: A = load 'logs' using org.apache.hcatalog.pig.HCatLoader(); dump A ERROR org.apache.pig.tools.grunt.Grunt - ERROR 2998: Unhandled internal error. Found interface org.apache.hadoop.mapreduce.JobContext, but class was expected load and describe work fines but dump gives above error Details: hadoop-2.6.0 pig-0.14.0 hive-0.12.0

PIG - Found interface org.apache.hadoop.mapreduce.JobContext, but class was expected

泪湿孤枕 提交于 2019-12-28 04:28:07
问题 I was trying to load a table from hive. I am using Hcatalog for that. I logged into hive using pig -useHCatalog i export almost all jars from hive and hadoop register 'hdfs://localhost:8020/user/pig/jars/hive-jdbc-0.10.0-cdh4.5.0.jar'; register 'hdfs://localhost:8020/user/pig/jars/hive-exec-0.10.0-cdh4.5.0.jar'; register 'hdfs://localhost:8020/user/pig/jars/hive-common-0.10.0-cdh4.5.0.jar'; register 'hdfs://localhost:8020/user/pig/jars/hive-metastore-0.10.0-cdh4.5.0.jar'; register 'hdfs:/

Exporting sequence file to Oracle by Sqoop

房东的猫 提交于 2019-12-25 06:33:08
问题 I have been trying to find some documentations about how we can export sequence file to Oracle by using Sqoop. Is that possible? Currently I have my files(in HDFS) in text based format and I am using Sqoop to export those files to some Oracle's tables and its working fine. Now I want to change the format of the file from text to sequence file or something else (Avro later). So what I need to do if I want to export different file format from HDFS to Oracle using Sqoop? Any information will be

Sqoop import to HCatalog/Hive - table not visible

此生再无相见时 提交于 2019-12-24 01:57:17
问题 HDP-2.4.2.0-258 installed using Ambari 2.2.2.0 I have to import several SQL Server schema which should be accessible via Hive, Pig, MR and any third party(in future). I decided to import in HCatalog. Sqoop provides ways to import to Hive OR HCatalog, I guess if I import to HCatalog, the same table will be accessible from Hive CLI, to MR and to Pig(please evaluate my assumption). Questions : If imported to Hive directly, will the table be available to Pig, MR ? If imported to HCatalog, what

Getting an error on running HCatalog

被刻印的时光 ゝ 提交于 2019-12-23 11:31:33
问题 A = LOAD 'eventnew.txt' USING HCatalogLoader(); 2015-07-08 19:56:34,875 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1070: Could not resolve HCatalogLoader using imports: [, java.lang., org.apache.pig.builtin., org.apache.pig.impl.builtin.] Details at logfile: /home/KS5023833/pig_1436364102374.log Then I tried A = LOAD 'xyz' USING org.apache.hive.hcatalog.pig.HCatLoader(); This is also not working. 1070: Could not resolve org.apache.hive.hcatalog.pig.HCatLoader using imports: [, java

Type conversion pig hcatalog

一曲冷凌霜 提交于 2019-12-22 05:25:26
问题 I use HCatalog version 0.4. I have a table in hive 'abc' which has a column with datatype 'timestamp'. When i try to run a pig script like this "raw_data = load 'abc' using org.apache.hcatalog.pig.HCatLoader();" i get an error saying "java.lang.TypeNotPresentException: Type timestamp not present". 回答1: The problem is that hcatalog doesn’t support timestamp type. It will be supported under hive 0.13, they have an issue about this problem that was already solved, you can see the issue in https:

How to set the VCORES in hadoop mapreduce/yarn?

倾然丶 夕夏残阳落幕 提交于 2019-12-21 02:42:27
问题 The following are my configuration : **mapred-site.xml** map-mb : 4096 opts:-Xmx3072m reduce-mb : 8192 opts:-Xmx6144m **yarn-site.xml** resource memory-mb : 40GB min allocation-mb : 1GB the Vcores in hadoop cluster displayed 8GB but i dont know how the computation or where to configure it. hope someone could help me. 回答1: Short Answer It most probably doesn't matter, if you are just running hadoop out of the box on your single-node-cluster or even a small personal distributed cluster. You