hcatalog | 易学教程

Hive Row Formats&SerDe

阅读更多关于 Hive Row Formats&SerDe

Serde是 Serializer/Deserializer的简写。hive使用Serde进行行对象的序列与反序列化。 What is a SerDe? SerDe is a short name for " Serializer and Deserializer. " Hive uses SerDe (and FileFormat) to read and write table rows. HDFS files --> InputFileFormat --> <key, value> --> Deserializer --> Row object Row object --> Serializer --> <key, value> --> OutputFileFormat --> HDFS files 当是读取hdfs文件时key部分将会被忽略，在写入hdfs时key总是一个常量，一般的行的数据是存储在value中的。用户在建表的时候可以自定义 SerDe 或者使用自带的 SerDe。如果没有指定 ROW FORMAT 或者 ROW FORMAT DELIMITED，将会使用自带的 SerDe。你可以创建表时使用用户自定义的Serde或者native Serde，如果 ROW FORMAT没有指定或者指定了 ROW FORMAT DELIMITED就会使用native

Hive 0.13 external table dynamic partitioning custom pattern

阅读更多关于 Hive 0.13 external table dynamic partitioning custom pattern

问题 According to the documentation, you should be able to specify a custom pattern for a partition Hive external tables partitions. However, I can't get it to work: select * from rawlog_test7 limit 10; returns no records. This is what I am doing set hcat.dynamic.partitioning.custom.pattern="${year}/${month}/${day}/${hour}" I create my table with ... partitioned by (year int, month int, day int, hour int) location '/history.eu1/ed_reports/hourly/'; and my directory structure is ../2014/06/18/13/ .

ERROR 2998: Unhandled internal error. Found interface org.apache.hadoop.mapreduce.JobContext, but class was expected

阅读更多关于 ERROR 2998: Unhandled internal error. Found interface org.apache.hadoop.mapreduce.JobContext, but class was expected

问题 I am new to hadoop. I was trying to integrate PIG with hive using Hcatalog but getting the below error during dump. Please let me know if any of you can help me out: A = load 'logs' using org.apache.hcatalog.pig.HCatLoader(); dump A ERROR org.apache.pig.tools.grunt.Grunt - ERROR 2998: Unhandled internal error. Found interface org.apache.hadoop.mapreduce.JobContext, but class was expected load and describe work fines but dump gives above error Details: hadoop-2.6.0 pig-0.14.0 hive-0.12.0

ERROR 2998: Unhandled internal error. Found interface org.apache.hadoop.mapreduce.JobContext, but class was expected

阅读更多关于 ERROR 2998: Unhandled internal error. Found interface org.apache.hadoop.mapreduce.JobContext, but class was expected

PIG - Found interface org.apache.hadoop.mapreduce.JobContext, but class was expected

阅读更多关于 PIG - Found interface org.apache.hadoop.mapreduce.JobContext, but class was expected

问题 I was trying to load a table from hive. I am using Hcatalog for that. I logged into hive using pig -useHCatalog i export almost all jars from hive and hadoop register 'hdfs://localhost:8020/user/pig/jars/hive-jdbc-0.10.0-cdh4.5.0.jar'; register 'hdfs://localhost:8020/user/pig/jars/hive-exec-0.10.0-cdh4.5.0.jar'; register 'hdfs://localhost:8020/user/pig/jars/hive-common-0.10.0-cdh4.5.0.jar'; register 'hdfs://localhost:8020/user/pig/jars/hive-metastore-0.10.0-cdh4.5.0.jar'; register 'hdfs:/

Exporting sequence file to Oracle by Sqoop

阅读更多关于 Exporting sequence file to Oracle by Sqoop

问题 I have been trying to find some documentations about how we can export sequence file to Oracle by using Sqoop. Is that possible? Currently I have my files(in HDFS) in text based format and I am using Sqoop to export those files to some Oracle's tables and its working fine. Now I want to change the format of the file from text to sequence file or something else (Avro later). So what I need to do if I want to export different file format from HDFS to Oracle using Sqoop? Any information will be

Sqoop import to HCatalog/Hive - table not visible

阅读更多关于 Sqoop import to HCatalog/Hive - table not visible

问题 HDP-2.4.2.0-258 installed using Ambari 2.2.2.0 I have to import several SQL Server schema which should be accessible via Hive, Pig, MR and any third party(in future). I decided to import in HCatalog. Sqoop provides ways to import to Hive OR HCatalog, I guess if I import to HCatalog, the same table will be accessible from Hive CLI, to MR and to Pig(please evaluate my assumption). Questions : If imported to Hive directly, will the table be available to Pig, MR ? If imported to HCatalog, what

Getting an error on running HCatalog

阅读更多关于 Getting an error on running HCatalog

问题 A = LOAD 'eventnew.txt' USING HCatalogLoader(); 2015-07-08 19:56:34,875 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1070: Could not resolve HCatalogLoader using imports: [, java.lang., org.apache.pig.builtin., org.apache.pig.impl.builtin.] Details at logfile: /home/KS5023833/pig_1436364102374.log Then I tried A = LOAD 'xyz' USING org.apache.hive.hcatalog.pig.HCatLoader(); This is also not working. 1070: Could not resolve org.apache.hive.hcatalog.pig.HCatLoader using imports: [, java

Type conversion pig hcatalog

阅读更多关于 Type conversion pig hcatalog

问题 I use HCatalog version 0.4. I have a table in hive 'abc' which has a column with datatype 'timestamp'. When i try to run a pig script like this "raw_data = load 'abc' using org.apache.hcatalog.pig.HCatLoader();" i get an error saying "java.lang.TypeNotPresentException: Type timestamp not present". 回答1: The problem is that hcatalog doesn’t support timestamp type. It will be supported under hive 0.13, they have an issue about this problem that was already solved, you can see the issue in https:

How to set the VCORES in hadoop mapreduce/yarn?

阅读更多关于 How to set the VCORES in hadoop mapreduce/yarn?

问题 The following are my configuration : **mapred-site.xml** map-mb : 4096 opts:-Xmx3072m reduce-mb : 8192 opts:-Xmx6144m **yarn-site.xml** resource memory-mb : 40GB min allocation-mb : 1GB the Vcores in hadoop cluster displayed 8GB but i dont know how the computation or where to configure it. hope someone could help me. 回答1: Short Answer It most probably doesn't matter, if you are just running hadoop out of the box on your single-node-cluster or even a small personal distributed cluster. You