Pig: loading a data file using an external schema file

风格不统一 提交于 2019-12-19 19:49:42

问题


I have a data file and a corresponding schema file stored in separate locations. I would like to load the data using the schema in the schema-file. I tried using

A= LOAD '<file path>' USING PigStorage('\u0001') as '<schema-file path>' 

but get an error.

What is the syntax for correctly loading the file?

The schema file format is something like:

data1 - complex - - - - format - -
data1 event_type - - - - - long - "ends '\001'"
data1 event_id - - - - - varchar(50) - "ends '\001'"
data1 name_format - - - - - varchar(10) - "ends newline"

回答1:


The AS clause is for specifying the schema directly not the path to the schema file.

 A = LOAD '<file path>' USING PigStorage('\u0001') as 'type: long, id:chararray, nameformat:chararray';

Alternatively, a file named .pig_schema containing the schema and located in your input directory could work as well. Never tried that though. It must be a JSON file with the following syntax:

{"fields":[
        {"name":"type","type":55,"description":"Fu","schema":null},
        {"name":"id","type":15,"description":"Bar","schema":null},
        {"name":"nameFormat","type":55,"description":"Xu","schema":null},
    ] ,"version":0,"sortKeys":[],"sortKeyOrders":[]}

This file is also generated if you specify the -schema option when storing with PigStorage.




回答2:


It's possible to load data with schema file.

When you store your data with the '-schema' flag, in the output path, there is .pig-schema file that hold json with the schema.

You can use it when loading data

B = LOAD '<>' USING PigStorage(',','-schema'); 

You can see the schema by running

describe A;

Check this good post for more details.

This feature is available beginning with Pig 0.10.



来源:https://stackoverflow.com/questions/20173335/pig-loading-a-data-file-using-an-external-schema-file

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!