c# headless browser with javascript support for crawler

独自空忆成欢 提交于 2019-11-30 08:36:59

问题


Could anyone suggest headless browser for .NET that supports cookies and authomatically javascript execution?


回答1:


Selenium+HtmlUnitDriver/GhostDriver is exactly what you are looking for. Oversimplified, Selenium is library for using variety of browsers for automation purposes - testing, scraping, task automation.

There are different WebDriver classes with which you can operate an actual browser. HtmlUnitDriver is a headless one. GhostDriver is a WebDriver for PhantomJS, so you can write C# while actually PhantomJS will do the heavy lifting.

Code snippet from Selenium docs for Firefox, but code with GhostDriver (PhantomJS) or HtmlUnitDriver is almost identical.

using OpenQA.Selenium;
using OpenQA.Selenium.Firefox;
using OpenQA.Selenium.Support.UI;

class GoogleSuggest
{
    static void Main(string[] args)
    {
        // driver initialization varies across different drivers
        // but they all support parameter-less constructors
        IWebDriver driver = new FirefoxDriver();
        driver.Navigate().GoToUrl("http://www.google.com/");


        IWebElement query = driver.FindElement(By.Name("q"));
        query.SendKeys("Cheese");
        query.Submit();

        WebDriverWait wait = new WebDriverWait(driver, TimeSpan.FromSeconds(10));
        wait.Until((d) => { return d.Title.ToLower().StartsWith("cheese"); });

        System.Console.WriteLine("Page title is: " + driver.Title);

        driver.Quit();
    }
}

If you run this on Windows machine you can use actual Firefox/Chrome driver because it will open an actual browser window which will operate as programmed in your C#. HtmlUnitDriver is the most lightweight and fast.

I have successfully ran Selenium for C# (FirefoxDriver) on Linux using Mono. I suppose HtmlUnitDriver will also work as fine as the others, so if you require speed - I suggest you go for Mono (you can develop, test and compile with Visual Studio on Windows, no problem) + Selenium HtmlUnitDriver running on Linux host without desktop.




回答2:


I am not aware of a .NET based headless browser but there is always PhantomJS which is C/C++ and it works fairly well for assisting in unit testing of JS with QUnit.

There is also another relevant question here which might help you - Headless browser for C# (.NET)?



来源:https://stackoverflow.com/questions/15254817/c-sharp-headless-browser-with-javascript-support-for-crawler

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!