Dynamically change proxy in HttpClient without hard cpu usage

孤者浪人 提交于 2019-12-05 05:15:12

问题


I need to create a multithreaded application which makes requests (Post, get etc) For this purpose i chose Httpclient.

By default it does not support Socks proxies. So I find Sockshandler (https://github.com/extremecodetv/SocksSharp) can be used instead of basic HttpClientHandler. It allows me to use socks.

But I have a problem. All my requests should be send through different proxies which I have parsed from the internet. But httpclient handler doesn't support changing proxies dynamically. If I don't have valid proxy, I need to recreate a httclient, this is ok, but if I have 200 threads, it takes a lot of cpu. So what should I do in this situation?

And second problem. I found this article (https://aspnetmonsters.com/2016/08/2016-08-27-httpclientwrong/) which talks to use HttpClient as a single instance to better performance, but it's impossible in multithreaded program. Which way is better in this case?

Thx for help


回答1:


httpclient handler doesn't support changing proxies dynamically.

I'm not sure if that's technically true. Proxy is a read/write property so I believe you could change it (unless that results in a runtime error...I haven't actually tried it to be honest).

UPDATE: I have tried it now and your assertion is technically true. In the sample below, the line that updates UseProxy will fail with "System.InvalidOperationException: 'This instance has already started one or more requests. Properties can only be modified before sending the first request.'" Confirmed on .NET Core and full framework.

var hch = new HttpClientHandler { UseProxy = false };
var hc = new HttpClient(hch);
var resp = await hc.GetAsync(someUri);

hch.UseProxy = true; // fail!
hch.Proxy = new WebProxy(someProxy);
resp = await hc.GetAsync(someUri);

But what is true is that you can't set a different property per request in a thread-safe way, and that's unfortunate.

if I have 200 threads, it takes a lot of cpu

Concurrent asynchronous HTTP calls should not consume extra threads nor CPU. Fire them off using await Task.WhenAll or similar and there is no thread consumed until a response is returned.

And second problem. I found this article...

That's definitely something you need to look out for. However, even if you could set a different proxy per request, the underlying network stack would still need to open a socket for each proxy, so you wouldn't be gaining anything over an HttpClient instance per proxy in terms of the socket exhaustion problem.

The best solution depends on just how many proxies you're talking about here. In the article, the author describes running into problems when the server hit around 4000-5000 open sockets, and no problems around 400 or less. YMMV, but if the number of proxies is no more than a few hundred, you should be safe creating a new HttpClient instance per proxy. If it's more, I would look at throttling your concurrency and test it until find a number where your server resources can keep up. In any case, make sure that if you need to make multiple calls to the same proxy, you're re-using HttpClient instances for them. A ConcurrentDictionary could be useful for lazily creating and reusing those instances.




回答2:


I agree with Todd Menier's answer. But if you use .Net core I suggest to read this and this articles where Microsoft says:

Instantiating an HttpClient class for every request will exhaust the number of sockets available under heavy loads. That issue will result in SocketException errors.

It's sad, but they provide a solution:

To address those mentioned issues and make the management of HttpClient instances easier, .NET Core 2.1 introduced a new HttpClientFactory that can also be used to implement resilient HTTP calls by integrating Polly with it.

I looked at IHttpClientFactory summary block and see that:

Each call to System.Net.Http.IHttpClientFactory.CreateClient(System.String) is guaranteed to return a new System.Net.Http.HttpClient instance. Callers may cache the returned System.Net.Http.HttpClient instance indefinitely or surround its use in a using block to dispose it when desired. The default System.Net.Http.IHttpClientFactory implementation may cache the underlying System.Net.Http.HttpMessageHandler instances to improve performance. Callers are also free to mutate the returned System.Net.Http.HttpClient instance's public properties as desired.

Let's look at picture

IHttpClientFactory implementation injecting into some service (CatalogueService or whatever you made) and then HttpClient instantiated via IHttpClientFactory every time when you need to make request (you can even wrap it into using(...) block), but HttpMessageHandler will be cached in some kind of connection pool.

So you can use HttpClientFactory to create so many HttpClient instances as you need and set proxy before you make call. I'd be glad if it helps.

UPDATE: I tried it out and it not actually what you need. You can implement own IHttpClientFactory like this:

public class Program
{
    public interface IHttpClientFactory
    {
        HttpClient CreateClientWithProxy(IWebProxy webProxy);
    }

    internal class HttpClientFactory : IHttpClientFactory
    {
        private readonly Func<HttpClientHandler> makeHandler;

        public HttpClientFactory(Func<HttpClientHandler> makeHandler)
        {
            this.makeHandler = makeHandler;
        }

        public HttpClient CreateClientWithProxy(IWebProxy webProxy)
        {
            var handler = this.makeHandler();
            handler.Proxy = webProxy;
            return new HttpClient(handler, true);
        }
    }

    internal class CachedHttpClientFactory : IHttpClientFactory
    {
        private readonly IHttpClientFactory httpClientFactory;
        private readonly Dictionary<int, HttpClient> cache = new Dictionary<int, HttpClient>();

        public CachedHttpClientFactory(IHttpClientFactory httpClientFactory)
        {
            this.httpClientFactory = httpClientFactory;
        }

        public HttpClient CreateClientWithProxy(IWebProxy webProxy)
        {
            var key = webProxy.GetHashCode();
            lock (this.cache)
            {
                if (this.cache.ContainsKey(key))
                {
                    return this.cache[key];
                }

                var result = this.httpClientFactory.CreateClientWithProxy(webProxy);
                this.cache.Add(key, result);
                return result;
            }
        }
    }

    public static void Main(string[] args)
    {
        var httpClientFactory = new HttpClientFactory(() => new HttpClientHandler
        {
            UseCookies = true,
            UseDefaultCredentials = true,
        });

        var cachedhttpClientFactory = new CachedHttpClientFactory(httpClientFactory);
        var proxies = new[] {
            new WebProxy()
            {
                Address = new Uri("https://contoso.com"),
            },
            new WebProxy()
            {
                Address = new Uri("https://microsoft.com"),
            },
        };

        foreach (var item in proxies)
        {
            var client = cachedhttpClientFactory.CreateClientWithProxy(item);
            client.GetAsync("http://someAddress.com");
        }
    }
}

But be careful with large collections of WebProxy that can occupy all connections in pool.



来源:https://stackoverflow.com/questions/49818605/dynamically-change-proxy-in-httpclient-without-hard-cpu-usage

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!