I\'m working on a Python library that interfaces with a web service API. Like many web services I\'ve encountered, this one requests limiting the rate of requests. I would l
Your rate limiting scheme should be heavily influenced by the calling conventions of the underlying code (syncronous or async), as well as what scope (thread, process, machine, cluster?) this rate-limiting will operate at.
I would suggest keeping all the variables within the instance, so you can easily implement multiple periods/rates of control.
Lastly, it sounds like you want to be a middleware component. Don't try to be an application and introduce threads on your own. Just block/sleep if you are synchronous and use the async dispatching framework if you are being called by one of them.