why boolean field is not working in Hive?

不想你离开。 提交于 2021-01-29 02:21:19

问题


I have a column in my hive table which datatype is boolean. when I tried to import data from csv, it stored as NULL.

This is my sample table :

CREATE tABLE if not exists Engineanalysis(

EngineModel String,

EnginePartNo String ,

Location String,

Position String,

InspectionReq boolean)

ROW FORMAT DELIMITED

FIELDS TERMINATED BY ','

LINES TERMINATED BY '\n';

My sample data :

AB01,AS01-IT01,AIRFRAME,,0

AB02,AS01-IT02,AIRFRAME,,1

AB03,AS01-IT03,AIRFRAME,,1

AB04,AS01-IT04,AIRFRAME,,1

AB05,AS01-IT05,HEAD,,1

AB06,AS01-IT06,HEAD,,0

AB07,AS01-IT07,HEAD,,0

AB08,AS01-IT08,HEAD,,0

AB09,AS01-IT09,NOSE,,1

AB10,AS01-IT10,NOSE,,0

Result :

AB01 AS01-IT01 AIRFRAME NULL

AB02 AS01-IT02 AIRFRAME NULL

AB03 AS01-IT03 AIRFRAME NULL

AB04 AS01-IT04 AIRFRAME NULL

AB05 AS01-IT05 HEAD NULL

AB06 AS01-IT06 HEAD NULL

AB07 AS01-IT07 HEAD NULL

AB08 AS01-IT08 HEAD NULL

AB09 AS01-IT09 NOSE NULL

AB10 AS01-IT10 NOSE NULL

when loading manually :

insert into Engineanalysis select 'AB11','AS01-IT11','AIRFRAME','',0;

Result:

AB11 AS01-IT11 AIRFRAME false

can someone explain why this dissimilarity?


回答1:


Boolean type needs literals representation. SQL standard defines only three values for boolean: TRUE, FALSE, and UNKNOWN(=NULL in Hive). Using integers is not standardized in SQL, though many databases supports them.

You are using LazySimpleSerDe for deserealizing table data. LazySimpleSerDe uses this property hive.lazysimple.extended_boolean_literal to determine if it treats 'T', 't', 'F', 'f', '1', and '0' as extended, legal boolean literals, in addition to 'TRUE' and 'FALSE'. The default is false, which means only 'TRUE' and 'FALSE' are treated as legal boolean literals.

Set this property to be able to read CSV files with 1 and 0 as Booleans:

hive.lazysimple.extended_boolean_literal=true;

See this Jira HIVE-3635 Try also to set this property in the table DDL:

TBLPROPERTIES ("hive.lazysimple.extended_boolean_literal"="true")

About using other than TRUE or FALSE boolean literals in Hive query language official documentation says implicit conversion of other types to Boolean is not possible: AllowedImplicitConversions though as you can see it works.

Few tests with casting strings and integers to boolean:

hive> select cast('' as boolean);
OK
false
Time taken: 8.642 seconds, Fetched: 1 row(s)
hive> select cast('1' as boolean);
OK
true
Time taken: 4.773 seconds, Fetched: 1 row(s)
hive> select cast('f' as boolean);
OK
true
Time taken: 8.548 seconds, Fetched: 1 row(s)
hive> select cast('0' as boolean);
OK
true
Time taken: 0.851 seconds, Fetched: 1 row(s)
hive> select cast(0 as boolean);
OK
false
Time taken: 1.713 seconds, Fetched: 1 row(s)
hive> select cast(1 as boolean);
OK
true
Time taken: 4.604 seconds, Fetched: 1 row(s)

Also have a look at this Jira: HIVE-3604 and Type Conversion Functions documentation, it says: If cast(expr as boolean) Hive returns true for a non-empty string. And actually this conforms with UDFToBoolean source code.

So, better use officially allowed literals for Boolean type to make sure you have no side effects also described in this article: Hive: Booleans Are Too Confusing To Be Usable



来源:https://stackoverflow.com/questions/55314572/why-boolean-field-is-not-working-in-hive

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!