Is it possible to load parquet table directly from file?

If I have a binary data file(it can be converted to csv format), Is there any way to load parquet table directly from it? Many tutorials show loading csv file to text table, and then from text table to parquet table. From efficiency point of view, is it possible to load parquet table directly from either a binary file like what I already have? Ideally using create external table command. Or I need to convert it to csv file first? Is there any file format restriction?

Unfortunately it is not possible to read from a custom binary format in Impala. You should convert your files to csv, then create an external table over the existing csv files as a temporary table, and finally insert into a final parquet table reading from the temp csv table. The Impala Parquet documentation has a lot more information and some related examples. See the section about compacting small files, which is similar.

I don't know how you convert your file format to csv, but you might consider writing a program to convert your binary format to Parquet. For example, you can write a MapReduce job that writes Parquet files. Here's an example that reads and writes Parquet: https://github.com/cloudera/parquet-examples/blob/master/MapReduce/TestReadWriteParquet.java

来源：https://stackoverflow.com/questions/28416731/is-it-possible-to-load-parquet-table-directly-from-file

标签

Hadoop

cloudera-cdh

impala

parquet

易学教程内所有资源均来自网络或用户发布的内容，如有违反法律规定的内容欢迎反馈！
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!