问题
I have a Hadoop data store I'm accessing in Pig and not a lot of documentation on it, plus I'm new to Pig, so I am looking for the Pig equivalent of "SHOW TABLES". When I have a connection to a MySQL db I can do this and get a general sense of what data is in there; I have found several tutorials but nothing on point. If not, is there some other way to orient myself to a Hadoop data store I know nothing about?
ETA: This would be when running Pig in interactive mode, rather than loading a script. Probably obvious, but I thought I should mention it.
回答1:
Pig doesn't have a concept of tables. It can read any file that is on your HDFS filesystem and stores the parsed result in a relation.
Note that you can also run HDFS filesystem commands from the grunt shell
It's probably best you familiarise yourself with HDFS first and make sure you can comfortably navigate the filesystem first so you can find what data you want to process with Pig.
回答2:
The closest thing I can see to 'show tables' is the 'history' command, which effectively lists all aliases created.
grunt> history
1 a = LOAD 'iris.csv' USING PigStorage (',') AS
(sl:double,sw:double,pl:double,pw:double,spec:int);
2 b = FILTER a BY spec==1;
3 c = GROUP b BY pw;
4 d = FOREACH c GENERATE COUNT(b);
回答3:
We had also came across similar situation and applied all solutions of stackoverflow but none had solved my issue . Now solution of these problem is that , you should use store command of pig and also provide dedicated folder for it . Now the set up which we prefer is ,
grunt> fs -mkdir /user/hduser/AllPigTableStructures/
grunt> fs -chmod 777 /user/hduser/AllPigTableStructures/
Now we will store all table informations into these folder named "AllPigTableStructures". Then you should use "store" function as below code,
grunt> store extract_details into '/user/hduser/AllPigTableStructures/SchemaTwit' using PigStorage('\t', '-schema');
the last line of these code should be
/*2017-09-18 02:13:56,566 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Success!
*/
Now you should see a folder with named SchemaTwit like these,
grunt> fs -ls /user/hduser/AllPigTableStructures
Found 12 items
drwxr-xr-x - hduser supergroup 0 2017-09-18 02:13 /user/hduser/AllPigTableStructures/SchemaTwit
and at last if you will see content of SchemaTwit directory then it will contain your schema of your table and all details about your table below is command for it and part-m-xxx kind of file will contains your data part.
grunt> fs -ls /user/hduser/AllPigTableStructures/SchemaTwit
Found 4 items
-rw-r--r-- 2 hduser supergroup 8 2017-09-18 02:26 /user/hduser/AllPigTableStructures/SchemaTwit/.pig_header
-rw-r--r-- 2 hduser supergroup 239 2017-09-18 02:26 /user/hduser/AllPigTableStructures/SchemaTwit/.pig_schema
-rw-r--r-- 2 hduser supergroup 0 2017-09-18 02:26 /user/hduser/AllPigTableStructures/SchemaTwit/_SUCCESS
-rw-r--r-- 2 hduser supergroup 140 2017-09-18 02:26 /user/hduser/AllPigTableStructures/SchemaTwit/part-m-00000
Now you can use below cat command on schema file to see schema of your table of part-m-xxx for browsing your data part
grunt> fs -cat /user/hduser/AllPigTableStructures/SchemaTwit/.pig_schema
{"fields":[{"name":"id","type":50,"description":"autogenerated from Pig Field Schema","schema":null},{"name":"text","type":50,"description":"autogenerated from Pig Field Schema","schema":null}],"version":0,"sortKeys":[],"sortKeyOrders":[]}
Now for loading your table with schema these command help,
WithSchema = LOAD '/user/hduser/AllPigTableStructures/SchemaTwit';
PS: We are running our pig into mapreduce mode .
回答4:
Looks like you have mistaken Pig. As @seedhead has specified, you handle files with Pig. Folks quite often mistake it as a a database(like Hbase) or a warehouse(like Hive), which it is not. And, as far as visualizing the data is concerned, you could list the files and directories through Pig shell. And if you need to see how many records(or lines) a particular files has, you could do something like this :
Records = LOAD '/path_of_the_file';
Records_Group= GROUP Records ALL;
Records_Count = FOREACH Records_Group GENERATE COUNT(Records);
来源:https://stackoverflow.com/questions/16529487/is-there-an-apache-pig-equivalent-of-show-tables