If I have a binary data file(it can be converted to csv format), Is there any way to load parquet table directly from it? Many tutorials show loading csv file to text table, and then from text table to parquet table. From efficiency point of view, is it possible to load parquet table directly from either a binary file like what I already have? Ideally using create external table command. Or I need to convert it to csv file first? Is there any file format restriction?
Unfortunately it is not possible to read from a custom binary format in Impala. You should convert your files to csv, then create an external table over the existing csv files as a temporary table, and finally insert into a final parquet table reading from the temp csv table. The Impala Parquet documentation has a lot more information and some related examples. See the section about compacting small files, which is similar.
I don't know how you convert your file format to csv, but you might consider writing a program to convert your binary format to Parquet. For example, you can write a MapReduce job that writes Parquet files. Here's an example that reads and writes Parquet: https://github.com/cloudera/parquet-examples/blob/master/MapReduce/TestReadWriteParquet.java
来源:https://stackoverflow.com/questions/28416731/is-it-possible-to-load-parquet-table-directly-from-file