Is it possible to get the number of rows in a CSV file without opening it?

前端 未结 4 1529
逝去的感伤
逝去的感伤 2020-12-03 10:13

I have a CSV file of size ~1 GB, and as my laptop is of basic configuration, I\'m not able to open the file in Excel or R. But out of curiosity, I would like to get the numb

4条回答
  •  抹茶落季
    2020-12-03 10:40

    Option 1:

    Through a file connection, count.fields() counts the number of fields per line of the file based on some sep value (that we don't care about here). So if we take the length of that result, theoretically we should end up with the number of lines (and rows) in the file.

    length(count.fields(filename))
    

    If you have a header row, you can skip it with skip = 1

    length(count.fields(filename, skip = 1))
    

    There are other arguments that you can adjust for your specific needs, like skipping blank lines.

    args(count.fields)
    # function (file, sep = "", quote = "\"'", skip = 0, blank.lines.skip = TRUE, 
    #     comment.char = "#") 
    # NULL
    

    See help(count.fields) for more.

    It's not too bad as far as speed goes. I tested it on one of my baseball files that contains 99846 rows.

    nrow(data.table::fread("Batting.csv"))
    # [1] 99846
    
    system.time({ l <- length(count.fields("Batting.csv", skip = 1)) })
    #   user  system elapsed 
    #  0.528   0.000   0.503 
    
    l
    # [1] 99846
    file.info("Batting.csv")$size
    # [1] 6153740
    

    (The more efficient) Option 2: Another idea is to use data.table::fread() to read the first column only, then take the number of rows. This would be very fast.

    system.time(nrow(fread("Batting.csv", select = 1L)))
    #   user  system elapsed 
    #  0.063   0.000   0.063 
    

提交回复
热议问题