How to get website title from c#

≡放荡痞女 提交于 2019-11-30 06:42:59
Timothy Khouri

A simpler way to get the content:

WebClient x = new WebClient();
string source = x.DownloadString("http://www.singingeels.com/");

A simpler, more reliable way to get the title:

string title = Regex.Match(source, @"\<title\b[^>]*\>\s*(?<Title>[\s\S]*?)\</title\>",
    RegexOptions.IgnoreCase).Groups["Title"].Value;

Perhaps with this suggestion a new world opens up for you I also had this question and came to this

Download "Html Agility Pack" from http://html-agility-pack.net/?z=codeplex

Or go to nuget: https://www.nuget.org/packages/HtmlAgilityPack/ And add in this reference.

Add folow using in the code file:

using HtmlAgilityPack;

Write folowing code in your methode:

var webGet = new HtmlWeb();
var document = webGet.Load(url);    
var title = document.DocumentNode.SelectSingleNode("html/head/title").InnerText;

Sources:

https://codeshare.co.uk/blog/how-to-scrape-meta-data-from-a-url-using-htmlagilitypack-in-c/ HtmlAgilityPack obtain Title and meta

Inorder to accomplish this you are going to need to do a couple of things.

  • Make your app threaded, so that you can process multiple requests at the time and maximize the number of HTTP requests that are being made.
  • Durring the async request, download only the amount of data you want to pull back, you could probably do parsing on the data as it comes back looking for
  • Probably want to use regex to pull out the title name

I have done this before with SEO bots and I have been able to handle almost 10,000 requests at a single time. You just need to make sure that each web request can be self contained in a thread.

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!