How to skip headers when we are reading data from a csv file in s3 and creating a table in aws athena.

岁酱吖の 提交于 2019-12-05 04:48:46

This is what works in Redshift:

You want to use table properties ('skip.header.line.count'='1') Along with other properties if you want, e.g. 'numRows'='100'. Here's a sample:

create external table exreddb1.test_table
(ID BIGINT 
,NAME VARCHAR
)
row format delimited
fields terminated by ','
stored as textfile
location 's3://mybucket/myfolder/'
table properties ('numRows'='100', 'skip.header.line.count'='1');

This is a known deficiency.

The best method I've seen was tweeted by Eric Hammond:

...WHERE date NOT LIKE '#%'

This appears to skip header lines during a Query. I'm not sure how it works, but it might be a method for skipping NULLs.

As of today (2019-11-18), the query from the OP seems to work. i.e. skip.header.line.count is honored and the first line is indeed skipped.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!