How to get website title from c#

后端未结

关注

 3  1829

I\'m revisiting som old code of mine and have stumbled upon a method for getting the title of a website based on its url. It\'s not really what you would call a stable metho

相关标签:

3条回答

执笔经年

2020-12-15 07:49

A simpler way to get the content:

WebClient x = new WebClient();
string source = x.DownloadString("http://www.singingeels.com/");

A simpler, more reliable way to get the title:

string title = Regex.Match(source, @"\<title\b[^>]*\>\s*(?<Title>[\s\S]*?)\</title\>",
    RegexOptions.IgnoreCase).Groups["Title"].Value;

0 讨论(0)

小鲜肉

2020-12-15 07:51
Inorder to accomplish this you are going to need to do a couple of things.
- Make your app threaded, so that you can process multiple requests at the time and maximize the number of HTTP requests that are being made.
- Durring the async request, download only the amount of data you want to pull back, you could probably do parsing on the data as it comes back looking for
- Probably want to use regex to pull out the title name
I have done this before with SEO bots and I have been able to handle almost 10,000 requests at a single time. You just need to make sure that each web request can be self contained in a thread.
0 讨论(0)
发布评论:

提交评论
- 加载中...
逝去的感伤

2020-12-15 07:55
Perhaps with this suggestion a new world opens up for you I also had this question and came to this

Download "Html Agility Pack" from http://html-agility-pack.net/?z=codeplex

Or go to nuget: https://www.nuget.org/packages/HtmlAgilityPack/ And add in this reference.

Add folow using in the code file:
```
using HtmlAgilityPack;
```
Write folowing code in your methode:
```
var webGet = new HtmlWeb();
var document = webGet.Load(url);    
var title = document.DocumentNode.SelectSingleNode("html/head/title").InnerText;
```
Sources:

https://codeshare.co.uk/blog/how-to-scrape-meta-data-from-a-url-using-htmlagilitypack-in-c/ HtmlAgilityPack obtain Title and meta
0 讨论(0)
发布评论:

提交评论
- 加载中...