Difference between 'Stored as InputFormat, OutputFormat' and 'Stored as' in Hive

前端 未结 2 2292
情书的邮戳
情书的邮戳 2021-02-20 10:31

Issue when executing a show create table and then executing the resulting create table statement if the table is ORC.

Using sho

2条回答
  •  别那么骄傲
    2021-02-20 11:03

    STORED AS implies 3 things:

    1. SERDE
    2. INPUTFORMAT
    3. OUTPUTFORMAT

    You have defined only the last 2, leaving the SERDE to be defined by hive.default.serde

    hive.default.serde
    Default Value: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
    Added in: Hive 0.14 with HIVE-5976
    The default SerDe Hive will use for storage formats that do not specify a SerDe.
    Storage formats that currently do not specify a SerDe include 'TextFile, RcFile'.

    Demo

    hive.default.serde

    set hive.default.serde;
    

    hive.default.serde=org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
    

    STORED AS ORC

    create table mytable (i int) 
    stored as orc;
    
    show create table mytable;
    

    Note that the SERDE is 'org.apache.hadoop.hive.ql.io.orc.OrcSerde'

    CREATE TABLE `mytable`(
      `i` int)
    ROW FORMAT SERDE 
      'org.apache.hadoop.hive.ql.io.orc.OrcSerde' 
    STORED AS INPUTFORMAT 
      'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat' 
    OUTPUTFORMAT 
      'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat'
    LOCATION
      'file:/home/cloudera/local_db/mytable'
    TBLPROPERTIES (
      'COLUMN_STATS_ACCURATE'='{\"BASIC_STATS\":\"true\"}', 
      'numFiles'='0', 
      'numRows'='0', 
      'rawDataSize'='0', 
      'totalSize'='0', 
      'transient_lastDdlTime'='1496982059')
    

    STORED AS INPUTFORMAT ... OUTPUTFORMAT ...

    create table mytable2 (i int) 
    STORED AS 
    INPUTFORMAT 
      'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat' 
    OUTPUTFORMAT 
      'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat'
    ;
    
    show create table mytable2
    ;
    

    Note that the SERDE is 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'

    CREATE TABLE `mytable2`(
      `i` int)
    ROW FORMAT SERDE 
      'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' 
    STORED AS INPUTFORMAT 
      'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat' 
    OUTPUTFORMAT 
      'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat'
    LOCATION
      'file:/home/cloudera/local_db/mytable2'
    TBLPROPERTIES (
      'COLUMN_STATS_ACCURATE'='{\"BASIC_STATS\":\"true\"}', 
      'numFiles'='0', 
      'numRows'='0', 
      'rawDataSize'='0', 
      'totalSize'='0', 
      'transient_lastDdlTime'='1496982426')
    

提交回复
热议问题