问题
I have a cron job which calls vendor api to fetch the companies list. Once the data is fetched, we are storing that data into cloud datastore as shown in the below code . For some reason for last two days when i trigger the cron job , started seeing the errors. When i debug the code locally i dont see this error
company_list = cron.rest_client.load(config, "companies", '')
if not company_list:
logging.info("Company list is empty")
return "Ok"
for row in company_list:
company_repository.save(row,original_data_source,
actual_data_source)
Repository code
def save( dto, org_ds , act_dp):
try:
key = 'FIN/%s' % (dto['ticker'])
company = CompanyInfo(id=key)
company.stock_code = key
company.ticker = dto['ticker']
company.name = dto['name']
company.original_data_source = org_ds
company.actual_data_provider = act_dp
company.put()
return company
except Exception:
logging.exception("company_repository: error occurred saving the
company record ")
raise
Error
DeadlineExceededError: The overall deadline for responding to the
HTTP request was exceeded.
Exception details
Traceback (most recent call last):
File
"/base/data/home/runtimes/python27_experiment/python27_lib/versions/1/googl
e/appengine/runtime/wsgi.py", line 267, in Handle
result = handler(dict(self._environ), self._StartResponse)
File "/base/data/home/apps/p~svasti-173418/internal-
api:20170808t160537.403249868819304873/lib/flask/app.py", line 1836, in __call__
return self.wsgi_app(environ, start_response)
File "/base/data/home/apps/p~svasti-173418/internal-
api:20170808t160537.403249868819304873/lib/flask/app.py", line 1817, in
wsgi_app
response = self.full_dispatch_request()
File "/base/data/home/apps/p~svasti-173418/internal-
api:20170808t160537.403249868819304873/lib/flask/app.py", line 1475, in full_dispatch_request
rv = self.dispatch_request()
File "/base/data/home/apps/p~svasti-173418/internal-api:20170808t160537.403249868819304873/lib/flask/app.py", line 1461, in dispatch_request
return self.view_functions[rule.endpoint](**req.view_args)
File "/base/data/home/apps/p~svasti-173418/internal-api:20170808t160537.403249868819304873/internal/cron/company_list.py", line 21, in run
company_repository.save(row,original_data_source, actual_data_source)
File "/base/data/home/apps/p~svasti-173418/internal-api:20170808t160537.403249868819304873/internal/repository/company_repository.py", line 13, in save
company.put()
File "/base/data/home/runtimes/python27_experiment/python27_lib/versions/1/google/appengine/ext/ndb/model.py", line 3458, in _put
return self._put_async(**ctx_options).get_result()
File "/base/data/home/runtimes/python27_experiment/python27_lib/versions/1/google/appengine/ext/ndb/tasklets.py", line 383, in get_result
self.check_success()
File "/base/data/home/runtimes/python27_experiment/python27_lib/versions/1/google/appengine/ext/ndb/tasklets.py", line 378, in check_success
self.wait()
File "/base/data/home/runtimes/python27_experiment/python27_lib/versions/1/google/appengine/ext/ndb/tasklets.py", line 362, in wait
if not ev.run1():
File "/base/data/home/runtimes/python27_experiment/python27_lib/versions/1/google/appengine/ext/ndb/eventloop.py", line 268, in run1
delay = self.run0()
File "/base/data/home/runtimes/python27_experiment/python27_lib/versions/1/google/appengine/ext/ndb/eventloop.py", line 248, in run0
_logging_debug('rpc: %s.%s', rpc.service, rpc.method)
File "/base/data/home/runtimes/python27_experiment/python27_lib/versions/1/google/appengine/api/apiproxy_stub_map.py", line 453, in service
@property
DeadlineExceededError: The overall deadline for responding to the HTTP request was exceeded.
回答1:
Has your company list been getting bigger?
How many entities are you trying to put?
Try saving them as a batch, instead of sequentially in a loop. Remove company.put()
from def save( dto, org_ds , act_dp):
and use ndb.put_multi()
afterwards instead.
company_list = cron.rest_client.load(config, "companies", '')
if not company_list:
logging.info("Company list is empty")
return "Ok"
company_objs=[]
for row in company_list:
company_objs.append(company_repository.save(row,original_data_source,
actual_data_source))
# put 500 at a time
if len(company_objs) > 500:
ndb.put_multi(company_objs)
company_objs=[]
# put any remainders
if len(company_objs) > 0:
ndb.put_multi(company_objs)
回答2:
My answer is based on one that Alex gave, but runs async.
I've replaced put_multi()
with put_multi_async()
By replacing the call to
put_multi()
with a call to its async equivalentput_multi_async()
, the application can do other things right away instead of blocking onput_multi()
.
And added @ndb.toplevel
decorator
This decorator tells the handler not to exit until its asynchronous requests have finished
If your data grows bigger, you may want to look deeper into defered library. It can be used to respawn task every X batches, with the rest of your unprocessed data.
@ndb.toplevel
def fetch_companies_list():
company_list = cron.rest_client.load(config, "companies", '')
if not company_list:
logging.info("Company list is empty")
return "Ok"
company_objs=[]
for row in company_list:
company_objs.append(company_repository.save(row,original_data_source,
actual_data_source))
# put 500 at a time
if len(company_objs) >= 500:
ndb.put_multi_async(company_objs)
company_objs=[]
# put any remainders
if len(company_objs) > 0:
ndb.put_multi_async(company_objs)
来源:https://stackoverflow.com/questions/45594018/deadlineexceedederror-the-overall-deadline-for-responding-to-the-http-request-w