How to count lines in a document?

后端 未结 24 1424
慢半拍i
慢半拍i 2020-11-27 08:47

I have lines like these, and I want to know how many lines I actually have...

09:16:39 AM  all    2.00    0.00    4.00    0.00    0.00    0.00    0.00    0.0         


        
24条回答
  •  时光取名叫无心
    2020-11-27 09:19

    wc -l does not count lines.

    Yes, this answer may be a bit late to the party, but I haven't found anyone document a more robust solution in the answers yet.

    Contrary to popular belief, POSIX does not require files to end with a newline character at all. Yes, the definition of a POSIX 3.206 Line is as follows:

    A sequence of zero or more non- characters plus a terminating character.

    However, what many people are not aware of is that POSIX also defines POSIX 3.195 Incomplete Line as:

    A sequence of one or more non- characters at the end of the file.

    Hence, files without a trailing LF are perfectly POSIX-compliant.

    If you choose not to support both EOF types, your program is not POSIX-compliant.

    As an example, let's have look at the following file.

    1 This is the first line.
    2 This is the second line.
    

    No matter the EOF, I'm sure you would agree that there are two lines. You figured that out by looking at how many lines have been started, not by looking at how many lines have been terminated. In other words, as per POSIX, these two files both have the same amount of lines:

    1 This is the first line.\n
    2 This is the second line.\n
    
    1 This is the first line.\n
    2 This is the second line.
    

    The man page is relatively clear about wc counting newlines, with a newline just being a 0x0a character:

    NAME
           wc - print newline, word, and byte counts for each file
    

    Hence, wc doesn't even attempt to count what you might call a "line". Using wc to count lines can very well lead to miscounts, depending on the EOF of your input file.

    POSIX-compliant solution

    You can use grep to count lines just as in the example above. This solution is both more robust and precise, and it supports all the different flavors of what a line in your file could be:

    • POSIX 3.75 Blank Line
    • POSIX 3.145 Empty Line
    • POSIX 3.195 Incomplete Line
    • POSIX 3.206 Line
    $ grep -c ^ FILE
    

提交回复
热议问题