How to count lines in a file on hdfs command?

后端 未结 3 442
梦如初夏
梦如初夏 2020-12-13 19:26

I have a file on HDFS that I want to know how many lines are. (testfile)

In linux, I can do:

wc -l 

Can I do somet

相关标签:
3条回答
  • 2020-12-13 20:00

    You cannot do it with a hadoop fs command. Either you have to write a mapreduce code with the logic explained in this post or this pig script would help.

    A = LOAD 'file' using PigStorage() as(...);
    B = group A all;
    cnt = foreach B generate COUNT(A);
    

    Makesure you have the correct extension for your snappy file so that pig could detect and read it.

    0 讨论(0)
  • 2020-12-13 20:03

    Total number of files: hadoop fs -ls /path/to/hdfs/* | wc -l

    Total number of lines: hadoop fs -cat /path/to/hdfs/* | wc -l

    Total number of lines for a given file: hadoop fs -cat /path/to/hdfs/filename | wc -l

    0 讨论(0)
  • 2020-12-13 20:08

    1. Number of lines of a mapper output file:

    `~]$ hadoop fs -cat /user/cloudera/output/part-m-00000 | wc -l`
    

    2. Number of lines of a text or any other file on hdfs:

    `~]$ hadoop fs -cat /user/cloudera/output/abc.txt | wc -l`
    

    3. Top (Header) 5 lines of a text or any other file on hdfs:

    `~]$ hadoop fs -cat /user/cloudera/output/abc.txt | head -5`
    

    4. Bottom 10 lines of a text or any other file on hdfs:

    `~]$ hadoop fs -cat /user/cloudera/output/abc.txt | tail -10`
    
    0 讨论(0)
提交回复
热议问题