NULL column names in Hive query result

陌路散爱 提交于 2019-12-02 00:48:37

Interesting question, it took me a minute to realize what is going on but with the right knowledge of hive it is actually obvious!

  1. The first thing to note here is that the NULL values occur in columns that are not of type string.
  2. The second thing to realize is that hive (unlike beeline for example) normally does NOT print column headers above your selection.

So, putting 1 and 2 together:

  • The column names are fine, as you will see from a query like Describe Weather.
  • The file that you use as datasource, appears to have had column names on the first row. These are now making up the first row of your hive table. Of course the columns of type string have no problem dealing with this data, but columns of type int will show NULL when they are asked to handle strings that cannot be cast to int properly.

Suggestion:

Try to get rid of the first row, preferably before creating the external table.

To add to Dennis' comment above, you can skip the first line from being inserted into your table if you're using a CSV SerDe like so:

CREATE EXTERNAL TABLE cases (
  id INT,
  case_number STRING,
  name STRING,
)
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde'
STORED AS TEXTFILE
LOCATION '/hdfs/path'
tblproperties("skip.header.line.count"="1");

The operative line being:

tblproperties("skip.header.line.count"="1")
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!