Using CefSharp.Offscreen to retrieve a web page that requires Javascript to render

前端 未结 2 979
情话喂你
情话喂你 2020-12-29 00:16

I have what is hopefully a simple task, but it\'s going to take someone that\'s versed in CefSharp to solve it.

I have an url that I want to retrieve the HTML from.

2条回答
  •  轮回少年
    2020-12-29 00:55

    If you can't get a headless version of Chromium to help you, you could try node.js and jsdom. Easy to install and play with once you have node up and running. You can see simple examples on Github README where they pull down a URL, run all javascript, including any custom javascript code (example: jQuery bits to count some type of elements), and then you have the HTML in memory to do what you want. You can just do $('body').html() and get a string, like in your pseudo code. (This even works for stuff like generating SVG graphics since that is just more XML tree nodes.)

    If you need this as part of a larger C# app that you need to distribute, your idea to use CefSharp.Offscreen sounds reasonable. One approach might be to get things working with CefSharp.WinForms or CefSharp.WPF first, where you can literally see things, then try CefSharp.Offscreen later when this all works. You can even get some JavaScript running in the on-screen browser to pull down body.innerHTML and return it as a string to the C# side of things before you go headless. If that works, the rest should be easy.

    Perhaps start with CefSharp.MinimalExample and get that compiling, then tweak it for your needs. You need to be able to set webBrowser.Address in your C# code, and you need to know when the page has Loaded, then you need to call webBrowser.EvaluateScriptAsync(".. JS code ..") with your JavaScript code (as a string) which will do something as described (returning bodyElement.innerHTML as a string).

提交回复
热议问题