How can I get html from page with cloudflare ddos portection?

后端 未结 3 1137
不知归路
不知归路 2020-12-08 22:54

I use htmlagility to get webpage data but I tried everything with page using www.cloudflare.com protection for ddos. The redirect page is not possible to handle in htmlagili

3条回答
  •  -上瘾入骨i
    2020-12-08 23:27

    A "simple" working method to bypass Cloudflare if you don't use libraries (that sometimes does not work).

    1. Open a "hidden" WebBrowser (size 1,1 or so).
    2. Open the root of your target Cloudflare site.
    3. Get the cookies from WebBrowser.
    4. Use these cookies in WebClient.

    Make sure the UserAgent for both WebBrowser and WebClient are identical. Cloudflare will give you a 503 if a mismatch there on the WebClient aftwerwards.

    You will need to search here on stack on how to get cookies from WebBrowser and how to modify WebClient so you can set its cookiecontainer + modify the UserAgent on 1 or both so they are identical.

    Since the cookies from Cloudflare seems to never expire, you can then serialize the cookies to somewhere temporary and load it each time you run your app, maybe a verification and refetch if failing.

    Been doing this for a while and it works quite well. Could not get the C# libs to work for a specific Cloudflare site while they worked on others. No clue to why yet.

    This also works behind the scenes on an IIS server, but you will have to set up "frowned upon" settings. That is, run the app pool as SYSTEM or ADMIN and set it to Classic mode.

提交回复
热议问题