I need to read the last 25 lines from a file (for displaying the most recent log entries). Is there anyway in Ruby to start at the end of a file and read it backwards?
I just wrote a quick implemenation with #seek
:
class File
def tail(n)
buffer = 1024
idx = (size - buffer).abs
chunks = []
lines = 0
begin
seek(idx)
chunk = read(buffer)
lines += chunk.count("\n")
chunks.unshift chunk
idx -= buffer
end while lines < n && pos != 0
chunks.join.lines.reverse_each.take(n).reverse.join
end
end
File.open('rpn-calculator.rb') do |f|
p f.tail(10)
end
Here's a version of tail that doesn't store any buffers in memory while you go, but instead uses "pointers". Also does bound-checking so you don't end up seeking to a negative offset (if for example you have more to read but less than your chunk size left).
def tail(path, n)
file = File.open(path, "r")
buffer_s = 512
line_count = 0
file.seek(0, IO::SEEK_END)
offset = file.pos # we start at the end
while line_count <= n && offset > 0
to_read = if (offset - buffer_s) < 0
offset
else
buffer_s
end
file.seek(offset-to_read)
data = file.read(to_read)
data.reverse.each_char do |c|
if line_count > n
offset += 1
break
end
offset -= 1
if c == "\n"
line_count += 1
end
end
end
file.seek(offset)
data = file.read
end
test cases at https://gist.github.com/shaiguitar/6d926587e98fc8a5e301
There is a library for Ruby called File::Tail. This can get you the last N lines of a file just like the UNIX tail utility.
I assume there is some seek optimization in place in the UNIX version of tail with benchmarks like these (tested on a text file just over 11M):
[john@awesome]$du -sh 11M.txt
11M 11M.txt
[john@awesome]$time tail -n 25 11M.txt
/sbin/ypbind
/sbin/arptables
/sbin/arptables-save
/sbin/change_console
/sbin/mount.vmhgfs
/misc
/csait
/csait/course
/.autofsck
/~
/usb
/cdrom
/homebk
/staff
/staff/faculty
/staff/faculty/darlinr
/staff/csadm
/staff/csadm/service_monitor.sh
/staff/csadm/.bash_history
/staff/csadm/mysql5
/staff/csadm/mysql5/MySQL-server-community-5.0.45-0.rhel5.i386.rpm
/staff/csadm/glibc-common-2.3.4-2.39.i386.rpm
/staff/csadm/glibc-2.3.4-2.39.i386.rpm
/staff/csadm/csunixdb.tgz
/staff/csadm/glibc-headers-2.3.4-2.39.i386.rpm
real 0m0.012s
user 0m0.000s
sys 0m0.010s
I can only imagine the Ruby library uses a similar method.
Edit:
for Pax's curiosity:
[john@awesome]$time cat 11M.txt | tail -n 25
/sbin/ypbind
/sbin/arptables
/sbin/arptables-save
/sbin/change_console
/sbin/mount.vmhgfs
/misc
/csait
/csait/course
/.autofsck
/~
/usb
/cdrom
/homebk
/staff
/staff/faculty
/staff/faculty/darlinr
/staff/csadm
/staff/csadm/service_monitor.sh
/staff/csadm/.bash_history
/staff/csadm/mysql5
/staff/csadm/mysql5/MySQL-server-community-5.0.45-0.rhel5.i386.rpm
/staff/csadm/glibc-common-2.3.4-2.39.i386.rpm
/staff/csadm/glibc-2.3.4-2.39.i386.rpm
/staff/csadm/csunixdb.tgz
/staff/csadm/glibc-headers-2.3.4-2.39.i386.rpm
real 0m0.350s
user 0m0.000s
sys 0m0.130s
still under a second, but if there is a lot of file operations this makes a big difference.
If on a *nix system with tail
, you can cheat like this:
last_25_lines = `tail -n 25 whatever.txt`
Improved version of manveru's excellent seek-based solution. This one returns exactly n lines.
class File
def tail(n)
buffer = 1024
idx = [size - buffer, 0].min
chunks = []
lines = 0
begin
seek(idx)
chunk = read(buffer)
lines += chunk.count("\n")
chunks.unshift chunk
idx -= buffer
end while lines < ( n + 1 ) && pos != 0
tail_of_file = chunks.join('')
ary = tail_of_file.split(/\n/)
lines_to_return = ary[ ary.size - n, ary.size - 1 ]
end
end
I can't vouch for Ruby but most of these languages follow the C idiom of file I/O. That means there's no way to do what you ask other than searching. This usually takes one of two approaches.
The second way is the one I prefer since, if you choose your first offset wisely, you'll almost certainly only need one shot at it. Log files still tend to have fixed maximum line lengths (I think coders still have a propensity for 80-column files long after their usefulness has degraded). I tend to choose number of lines desired multiplied by 132 as my offset.
And from a cursory glance of Ruby docs online, it looks like it does follow the C idiom. You would use "ios.seek(25*-132,IO::SEEK_END)"
if you were to follow my advice, then read forward from there.