Reading the last n lines of a file in Ruby?

前端 未结 8 1096
北荒
北荒 2020-11-30 09:59

I need to read the last 25 lines from a file (for displaying the most recent log entries). Is there anyway in Ruby to start at the end of a file and read it backwards?

相关标签:
8条回答
  • 2020-11-30 10:07

    I just wrote a quick implemenation with #seek:

    class File
      def tail(n)
        buffer = 1024
        idx = (size - buffer).abs
        chunks = []
        lines = 0
    
        begin
          seek(idx)
          chunk = read(buffer)
          lines += chunk.count("\n")
          chunks.unshift chunk
          idx -= buffer
        end while lines < n && pos != 0
    
        chunks.join.lines.reverse_each.take(n).reverse.join
      end
    end
    
    File.open('rpn-calculator.rb') do |f|
      p f.tail(10)
    end
    
    0 讨论(0)
  • 2020-11-30 10:07

    Here's a version of tail that doesn't store any buffers in memory while you go, but instead uses "pointers". Also does bound-checking so you don't end up seeking to a negative offset (if for example you have more to read but less than your chunk size left).

    def tail(path, n)
      file = File.open(path, "r")
      buffer_s = 512
      line_count = 0
      file.seek(0, IO::SEEK_END)
    
      offset = file.pos # we start at the end
    
      while line_count <= n && offset > 0
        to_read = if (offset - buffer_s) < 0
                    offset
                  else
                    buffer_s
                  end
    
        file.seek(offset-to_read)
        data = file.read(to_read)
    
        data.reverse.each_char do |c|
          if line_count > n
            offset += 1
            break
          end
          offset -= 1
          if c == "\n"
            line_count += 1
          end
        end
      end
    
      file.seek(offset)
      data = file.read
    end
    

    test cases at https://gist.github.com/shaiguitar/6d926587e98fc8a5e301

    0 讨论(0)
  • 2020-11-30 10:13

    There is a library for Ruby called File::Tail. This can get you the last N lines of a file just like the UNIX tail utility.

    I assume there is some seek optimization in place in the UNIX version of tail with benchmarks like these (tested on a text file just over 11M):

    [john@awesome]$du -sh 11M.txt
    11M     11M.txt
    [john@awesome]$time tail -n 25 11M.txt
    /sbin/ypbind
    /sbin/arptables
    /sbin/arptables-save
    /sbin/change_console
    /sbin/mount.vmhgfs
    /misc
    /csait
    /csait/course
    /.autofsck
    /~
    /usb
    /cdrom
    /homebk
    /staff
    /staff/faculty
    /staff/faculty/darlinr
    /staff/csadm
    /staff/csadm/service_monitor.sh
    /staff/csadm/.bash_history
    /staff/csadm/mysql5
    /staff/csadm/mysql5/MySQL-server-community-5.0.45-0.rhel5.i386.rpm
    /staff/csadm/glibc-common-2.3.4-2.39.i386.rpm
    /staff/csadm/glibc-2.3.4-2.39.i386.rpm
    /staff/csadm/csunixdb.tgz
    /staff/csadm/glibc-headers-2.3.4-2.39.i386.rpm
    
    real    0m0.012s
    user    0m0.000s
    sys     0m0.010s
    

    I can only imagine the Ruby library uses a similar method.

    Edit:

    for Pax's curiosity:

    [john@awesome]$time cat 11M.txt | tail -n 25
    /sbin/ypbind
    /sbin/arptables
    /sbin/arptables-save
    /sbin/change_console
    /sbin/mount.vmhgfs
    /misc
    /csait
    /csait/course
    /.autofsck
    /~
    /usb
    /cdrom
    /homebk
    /staff
    /staff/faculty
    /staff/faculty/darlinr
    /staff/csadm
    /staff/csadm/service_monitor.sh
    /staff/csadm/.bash_history
    /staff/csadm/mysql5
    /staff/csadm/mysql5/MySQL-server-community-5.0.45-0.rhel5.i386.rpm
    /staff/csadm/glibc-common-2.3.4-2.39.i386.rpm
    /staff/csadm/glibc-2.3.4-2.39.i386.rpm
    /staff/csadm/csunixdb.tgz
    /staff/csadm/glibc-headers-2.3.4-2.39.i386.rpm
    
    real    0m0.350s
    user    0m0.000s
    sys     0m0.130s
    

    still under a second, but if there is a lot of file operations this makes a big difference.

    0 讨论(0)
  • 2020-11-30 10:22

    If on a *nix system with tail, you can cheat like this:

    last_25_lines = `tail -n 25 whatever.txt`
    
    0 讨论(0)
  • 2020-11-30 10:22

    Improved version of manveru's excellent seek-based solution. This one returns exactly n lines.

    class File
    
      def tail(n)
        buffer = 1024
        idx = [size - buffer, 0].min
        chunks = []
        lines = 0
    
        begin
          seek(idx)
          chunk = read(buffer)
          lines += chunk.count("\n")
          chunks.unshift chunk
          idx -= buffer
        end while lines < ( n + 1 ) && pos != 0
    
        tail_of_file = chunks.join('')
        ary = tail_of_file.split(/\n/)
        lines_to_return = ary[ ary.size - n, ary.size - 1 ]
    
      end
    end
    
    0 讨论(0)
  • 2020-11-30 10:22

    I can't vouch for Ruby but most of these languages follow the C idiom of file I/O. That means there's no way to do what you ask other than searching. This usually takes one of two approaches.

    • Starting at the start of the file and scanning it all, remembering the most recent 25 lines. Then, when you hit end of file, print them out.
    • A similar approach but attempting to seek to a best-guess location first. That means seeking to (for example) end of file minus 4000 characters, then doing exactly what you did in the first approach with the proviso that, if you didn't get 25 lines, you have to back up and try again (e.g., to end of file minus 5000 characters).

    The second way is the one I prefer since, if you choose your first offset wisely, you'll almost certainly only need one shot at it. Log files still tend to have fixed maximum line lengths (I think coders still have a propensity for 80-column files long after their usefulness has degraded). I tend to choose number of lines desired multiplied by 132 as my offset.

    And from a cursory glance of Ruby docs online, it looks like it does follow the C idiom. You would use "ios.seek(25*-132,IO::SEEK_END)" if you were to follow my advice, then read forward from there.

    0 讨论(0)
提交回复
热议问题