How to load xls data from multiple xls file into hive?

Deadly 提交于 2019-12-22 00:33:07

问题


I am learning to use Hadoop for performing Big Data related operations.

I need to perform some queries on a collection of data sets split across 8 xls files. Each xls file has multiple sheets and the query concerns only one of the sheets.

The dataset can be downloaded here : http://www.census.gov/hhes/www/hlthins/data/utilization/tables.html

I am not using any commerical distro of hadoop for my tasks, just have one master and a slave VM set up in VmWare with Hadoop, Hive, Pig in them.

I am a novice with Hadoop and Big Data, so if anyone could guide me with how to proceed further I'd be very grateful.

If you need information on the queries or anything else let me know.

Thanks.


回答1:


In hive you cannot Load data into the tables from xls directly, as you do for a txt or csv files.

You have two options:

  1. Write an application (eg, Java) to read the xls files and convert them into text or csv files that can be loaded directly into a hive.

OR

  1. You can create your own serde (Serializer or Deserializer) that you provide to parse your xls data to be loaded into a table.

Both have their pros and cons, but If you intend to use an application interacting with HIVE for loading, querying, transforming etc. You can go with option 1. But, if you intend to do via scripts/batch etc you can go with option 2.



来源:https://stackoverflow.com/questions/29429679/how-to-load-xls-data-from-multiple-xls-file-into-hive

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!