I have a CSV file of size ~1 GB, and as my laptop is of basic configuration, I\'m not able to open the file in Excel or R. But out of curiosity, I would like to get the numb
Option 1:
Through a file connection, count.fields()
counts the number of fields per line of the file based on some sep
value (that we don't care about here). So if we take the length of that result, theoretically we should end up with the number of lines (and rows) in the file.
length(count.fields(filename))
If you have a header row, you can skip it with skip = 1
length(count.fields(filename, skip = 1))
There are other arguments that you can adjust for your specific needs, like skipping blank lines.
args(count.fields)
# function (file, sep = "", quote = "\"'", skip = 0, blank.lines.skip = TRUE,
# comment.char = "#")
# NULL
See help(count.fields)
for more.
It's not too bad as far as speed goes. I tested it on one of my baseball files that contains 99846 rows.
nrow(data.table::fread("Batting.csv"))
# [1] 99846
system.time({ l <- length(count.fields("Batting.csv", skip = 1)) })
# user system elapsed
# 0.528 0.000 0.503
l
# [1] 99846
file.info("Batting.csv")$size
# [1] 6153740
(The more efficient) Option 2: Another idea is to use data.table::fread()
to read the first column only, then take the number of rows. This would be very fast.
system.time(nrow(fread("Batting.csv", select = 1L)))
# user system elapsed
# 0.063 0.000 0.063