问题
I am trying to design a web api that can get data from an external server but with limitations. I'm trying to figure out how best to design it to be efficient.
My api has an endpoint that takes an input. It is is a domain name like tom@domain.com
. My endpoint then makes an http call to the domain to get an auth token, then makes another call to that domain with the username to get some data which is returned to the client. However my api can accept multiple usernames (comma delimited like ?users=tom@domain.a.com, bill@domain.b.com
). My web server knows for each domain what is the max parallel connections I can make to get the data.
So the problem is how to organize the data so I can maximize parallel computing but stay within the limits.
Here's my thoughts:
First parse the user list and group them up. Then have a static dictionary. Key is domain, value is a custom object which has 2 queues. Both queues holds a list of Tasks
(from async/await). However the first queue max length will be the value of the limit for that domain.
?users=bill@D.com, max@D.com, sarah@A.com, tom@D.com
dictionary = {
"D.com" : [
[],
["bill@D.com", "max@D.com", "tom@D.com"]
],
"A.com" : [
[],
["sarah@A.com"]
]
}
Then I can run a code every second, which loops through all dictionary values, and fills the first queue with as many Task objects from the second queue (.e. removing from 2nd queue and putting in first) so its within the limit.
As soon as its in the first queue, the task executes using Parallel.Invoke()
then when the task is completed it gets removed from first queue (unless some request is waiting for it, explained in next paragraph).
I do this because if another api request is made to my endpoint with some names thats already from the first request, I want to reuse it. So If it's in the first queue, I call await
on that Task.
Somehow when a task finishes, I need to know that no other people are waiting for that user in the task, and in that case, remove it from the first queue. Also if a client disconnects it should remove the watching of the users part for that client.
Does anyone know if this is a good approach?
回答1:
Since it's parallel, you know right away you're probably going to need to use System.Collections.Concurrent
, and since you need key/value lookup (user identifier/HTTP response) you need a ConcurrentDictionary. And since there is a common cache for all users, you will want to store it in a static variable, which is available to all threads and all HTTP requests.
Here is a simple example:
public class MyCacheClass
{
//Store the list of users/requests
static private ConcurrentDictionary<string, Task<HttpResponseMessage>> _cache = new ConcurrentDictionary<string, Task<HttpResponseMessage>>();
//Get from the ConcurrentDictionary or add if it's not there
public async Task<HttpResponseMessage> GetUser(string key)
{
return await _cache.GetOrAdd(key, GetResponse(key));
}
//You just to implement this method, potentially in a subclass, to get the data
protected virtual async Task<HttpResponseMessage> GetResponse(string key)
{
var httpClient = new HttpClient();
var url = string.Format(@"http://www.google.com?q={0}", key);
return await httpClient.GetAsync(url);
}
}
Then to get a user's information, just call:
var o = new MyCacheClass();
var userInfo = await o.GetUser(userID);
Note: If you're going to use code like this on a production system, you might consider adding some means of purging or trimming the cache after a period of time or when it reaches a certain size. Otherwise your solution may not scale the way you need it to.
来源:https://stackoverflow.com/questions/51737057/how-to-design-parallel-web-api-in-c