Use HTTPWebRequest to get remote page's title

匿名 (未验证) 提交于 2019-12-03 10:24:21

问题:

I have a web service that acts as an interface between a farm of websites and some analytics software. Part of the analytics tracking requires harvesting the page title. Rather than passing it from the webpage to the web service, I would like to use HTTPWebRequest to call the page.

I have code that will get the entire page and parse out the html to grab the title tag but I don't want to have to download the entire page to just get information that's in the head.

I've started with

HttpWebRequest request = (HttpWebRequest)HttpWebRequest.Create("url");   request.Method = "HEAD"; 

回答1:

Great idea, but a HEAD request only returns the document's HTTP headers. This does not include the title element, which is part of the HTTP message body.



回答2:

Try this:

using System; using System.Collections.Generic; using System.Linq; using System.Text; using System.Net; using System.IO; using System.Text.RegularExpressions;  namespace ConsoleApplication2 {     class Program     {         static void Main(string[] args)         {             string page = @"http://stackoverflow.com/";             HttpWebRequest req = (HttpWebRequest)HttpWebRequest.Create(page);             StreamReader SR = new StreamReader(req.GetResponse().GetResponseStream());              Char[] buf = new Char[256];             int count = SR.Read(buf, 0, 256);             while (count > 0)             {                 String outputData = new String(buf, 0, count);                 Match match = Regex.Match(outputData, @"<title>([^<]+)", RegexOptions.IgnoreCase);                 if (match.Success)                 {                     Console.WriteLine(match.Groups[1].Value);                 }                 count = SR.Read(buf, 0, 256);             }         }      } } 


回答3:

If you don't want to request the entire page, you can request it in pieces. The http spec defines a http header called Range. You would use it like below:

Range: bytes=0-100

You can look through the returned content and find the title. If it is not there, then request Range: 101-200 and so on until you get what you need.

Obviously, the web server needs to support range, so this may be hit or miss.



回答4:

So I would have to go with something like...

HttpWebRequest req   = (HttpWebRequest)WebRequest.Create(URL); HttpWebResponse resp = (HttpWebResponse)req.GetResponse(); Stream st            = resp.GetResponseStream(); StreamReader sr      = new StreamReader(st); string buffer        = sr.ReadToEnd(); int startPos, endPos; startPos = buffer.IndexOf("&lt;title>", StringComparison.CurrentCultureIgnoreCase) + 7; endPos = buffer.IndexOf("&lt;/title>", StringComparison.CurrentCultureIgnoreCase); string title = buffer.Substring(startPos, endPos - startPos); Console.WriteLine("Response code from {0}: {1}", s,         resp.StatusCode); Console.WriteLine("Page title: {0}", title); sr.Close(); st.Close(); 


标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!