问题
I built a simple API with one endpoint. It scrapes files and currently has around 30,000 records. I would ideally like to be able to fetch all those records in JSON with one http call.
Here is my Sinatra view code:
require 'sinatra'
require 'json'
require 'mongoid'
Mongoid.identity_map_enabled = false
get '/' do
content_type :json
Book.all
end
I've tried the following: using multi_json with
require './require.rb'
require 'sinatra'
require 'multi_json'
MultiJson.engine = :yajl
Mongoid.identity_map_enabled = false
get '/' do
content_type :json
MultiJson.encode(Book.all)
end
The problem with this approach is I get Error R14 (Memory quota exceeded). I get the same error when I try to use the 'oj' gem.
I would just concatinate everything one long Redis string, but Heroku's redis service is $30 per month for the instance size I would need (> 10mb).
My current solution is to use background task that creates objects and stuffs them full of jsonified objects at near the Mongoid object size limit (16mb). The problems with this approach: It still takes nearly 30 seconds to render, and I have to run post-processing on the receiving app to properly extract the json from the objects.
Does anyone have any better idea for how I can render json for 30k records in one call without switching away from Heroku?
回答1:
Sounds like you want to stream the JSON directly to the client instead of building it all up in memory. It's probably the best way to cut down memory usage. You could for example use yajl
to encode JSON directly to a stream.
Edit: I rewrote the entire code for yajl
, because its API is much more compelling and allows for much cleaner code. I also included an example for reading the response in chunks. Here's the streamed JSON array helper I wrote:
require 'yajl'
module JsonArray
class StreamWriter
def initialize(out)
super()
@out = out
@encoder = Yajl::Encoder.new
@first = true
end
def <<(object)
@out << ',' unless @first
@out << @encoder.encode(object)
@out << "\n"
@first = false
end
end
def self.write_stream(app, &block)
app.stream do |out|
out << '['
block.call StreamWriter.new(out)
out << ']'
end
end
end
Usage:
require 'sinatra'
require 'mongoid'
Mongoid.identity_map_enabled = false
# use a server that supports streaming
set :server, :thin
get '/' do
content_type :json
JsonArray.write_stream(self) do |json|
Book.all.each do |book|
json << book.attributes
end
end
end
To decode on the client side you can read and parse the response in chunks, for example with em-http
. Note that this solution requires the clients memory to be large enough to store the entire objects array. Here's the corresponding streamed parser helper:
require 'yajl'
module JsonArray
class StreamParser
def initialize(&callback)
@parser = Yajl::Parser.new
@parser.on_parse_complete = callback
end
def <<(str)
@parser << str
end
end
def self.parse_stream(&callback)
StreamParser.new(&callback)
end
end
Usage:
require 'em-http'
parser = JsonArray.parse_stream do |object|
# block is called when we are done parsing the
# entire array; now we can handle the data
p object
end
EventMachine.run do
http = EventMachine::HttpRequest.new('http://localhost:4567').get
http.stream do |chunk|
parser << chunk
end
http.callback do
EventMachine.stop
end
end
Alternative solution
You could actually simplify the whole thing a lot when you give up the need for generating a "proper" JSON array. What the above solution generates is JSON in this form:
[{ ... book_1 ... }
,{ ... book_2 ... }
,{ ... book_3 ... }
...
,{ ... book_n ... }
]
We could however stream each book as a separate JSON and thus reduce the format to the following:
{ ... book_1 ... }
{ ... book_2 ... }
{ ... book_3 ... }
...
{ ... book_n ... }
The code on the server would then be much simpler:
require 'sinatra'
require 'mongoid'
require 'yajl'
Mongoid.identity_map_enabled = false
set :server, :thin
get '/' do
content_type :json
encoder = Yajl::Encoder.new
stream do |out|
Book.all.each do |book|
out << encoder.encode(book.attributes) << "\n"
end
end
end
As well as the client:
require 'em-http'
require 'yajl'
parser = Yajl::Parser.new
parser.on_parse_complete = Proc.new do |book|
# this will now be called separately for every book
p book
end
EventMachine.run do
http = EventMachine::HttpRequest.new('http://localhost:4567').get
http.stream do |chunk|
parser << chunk
end
http.callback do
EventMachine.stop
end
end
The great thing is that now the client does not have to wait for the entire response, but instead parses every book separately. However, this will not work if one of your clients expects one single big JSON array.
来源:https://stackoverflow.com/questions/24291519/efficient-way-to-render-ton-of-json-on-heroku