how to load files on hadoop cluster using apache pig?

前端 未结 3 1860
北海茫月
北海茫月 2021-01-03 08:48

I have a pig script, and need to load files from local hadoop cluster. I can list the files using hadoop command: hadoop fs –ls /repo/mydata,` but when i tried to load file

3条回答
  •  天涯浪人
    2021-01-03 09:31

    My suggestion:

    1. Create a folder in hdfs : hadoop fs -mkdir /pigdata

    2. Load the file to the created hdfs folder: hadoop fs -put /opt/pig/tutorial/data/excite-small.log /pigdata

    (or you can do it from grunt shell as grunt> copyFromLocal /opt/pig/tutorial/data/excite-small.log /pigdata)

    1. Execute the pig latin script :

         grunt> set debug on
      
         grunt> set job.name 'first-p2-job'
      
         grunt> log = LOAD 'hdfs://hostname:54310/pigdata/excite-small.log' AS 
                    (user:chararray, time:long, query:chararray); 
         grunt> grpd = GROUP log BY user; 
         grunt> cntd = FOREACH grpd GENERATE group, COUNT(log); 
         grunt> STORE cntd INTO 'output';
      
    2. The output file will be stored in hdfs://hostname:54310/pigdata/output

提交回复
热议问题