Efficient way to render ton of JSON on Heroku

ε祈祈猫儿з 提交于 2019-12-24 14:45:43

问题


I built a simple API with one endpoint. It scrapes files and currently has around 30,000 records. I would ideally like to be able to fetch all those records in JSON with one http call.

Here is my Sinatra view code:

require 'sinatra'
require 'json'
require 'mongoid'

Mongoid.identity_map_enabled = false

get '/' do
  content_type :json
  Book.all
end

I've tried the following: using multi_json with

require './require.rb'
require 'sinatra'
require 'multi_json'
MultiJson.engine = :yajl

Mongoid.identity_map_enabled = false

get '/' do
  content_type :json
  MultiJson.encode(Book.all)
end

The problem with this approach is I get Error R14 (Memory quota exceeded). I get the same error when I try to use the 'oj' gem.

I would just concatinate everything one long Redis string, but Heroku's redis service is $30 per month for the instance size I would need (> 10mb).

My current solution is to use background task that creates objects and stuffs them full of jsonified objects at near the Mongoid object size limit (16mb). The problems with this approach: It still takes nearly 30 seconds to render, and I have to run post-processing on the receiving app to properly extract the json from the objects.

Does anyone have any better idea for how I can render json for 30k records in one call without switching away from Heroku?


回答1:


Sounds like you want to stream the JSON directly to the client instead of building it all up in memory. It's probably the best way to cut down memory usage. You could for example use yajl to encode JSON directly to a stream.

Edit: I rewrote the entire code for yajl, because its API is much more compelling and allows for much cleaner code. I also included an example for reading the response in chunks. Here's the streamed JSON array helper I wrote:

require 'yajl'

module JsonArray
  class StreamWriter
    def initialize(out)
      super()
      @out = out
      @encoder = Yajl::Encoder.new
      @first = true
    end

    def <<(object)
      @out << ',' unless @first
      @out << @encoder.encode(object)
      @out << "\n"
      @first = false
    end
  end

  def self.write_stream(app, &block)
    app.stream do |out|
      out << '['
      block.call StreamWriter.new(out)
      out << ']'
    end
  end
end

Usage:

require 'sinatra'
require 'mongoid'

Mongoid.identity_map_enabled = false

# use a server that supports streaming
set :server, :thin

get '/' do
  content_type :json
  JsonArray.write_stream(self) do |json|
    Book.all.each do |book|
      json << book.attributes
    end
  end
end

To decode on the client side you can read and parse the response in chunks, for example with em-http. Note that this solution requires the clients memory to be large enough to store the entire objects array. Here's the corresponding streamed parser helper:

require 'yajl'

module JsonArray
  class StreamParser
    def initialize(&callback)
      @parser = Yajl::Parser.new
      @parser.on_parse_complete = callback
    end

    def <<(str)
      @parser << str
    end
  end

  def self.parse_stream(&callback)
    StreamParser.new(&callback)
  end
end

Usage:

require 'em-http'

parser = JsonArray.parse_stream do |object|
  # block is called when we are done parsing the
  # entire array; now we can handle the data
  p object
end

EventMachine.run do
  http = EventMachine::HttpRequest.new('http://localhost:4567').get
  http.stream do |chunk|
    parser << chunk
  end
  http.callback do
    EventMachine.stop
  end
end

Alternative solution

You could actually simplify the whole thing a lot when you give up the need for generating a "proper" JSON array. What the above solution generates is JSON in this form:

[{ ... book_1 ... }
,{ ... book_2 ... }
,{ ... book_3 ... }
...
,{ ... book_n ... }
]

We could however stream each book as a separate JSON and thus reduce the format to the following:

{ ... book_1 ... }
{ ... book_2 ... }
{ ... book_3 ... }
...
{ ... book_n ... }

The code on the server would then be much simpler:

require 'sinatra'
require 'mongoid'
require 'yajl'

Mongoid.identity_map_enabled = false
set :server, :thin

get '/' do
  content_type :json
  encoder = Yajl::Encoder.new
  stream do |out|
    Book.all.each do |book|
      out << encoder.encode(book.attributes) << "\n"
    end
  end
end

As well as the client:

require 'em-http'
require 'yajl'

parser = Yajl::Parser.new
parser.on_parse_complete = Proc.new do |book|
  # this will now be called separately for every book
  p book
end

EventMachine.run do
  http = EventMachine::HttpRequest.new('http://localhost:4567').get
  http.stream do |chunk|
    parser << chunk
  end
  http.callback do
    EventMachine.stop
  end
end

The great thing is that now the client does not have to wait for the entire response, but instead parses every book separately. However, this will not work if one of your clients expects one single big JSON array.



来源:https://stackoverflow.com/questions/24291519/efficient-way-to-render-ton-of-json-on-heroku

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!