可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试):
问题:
I have a web service that acts as an interface between a farm of websites and some analytics software. Part of the analytics tracking requires harvesting the page title. Rather than passing it from the webpage to the web service, I would like to use HTTPWebRequest
to call the page.
I have code that will get the entire page and parse out the html to grab the title tag but I don't want to have to download the entire page to just get information that's in the head.
I've started with
HttpWebRequest request = (HttpWebRequest)HttpWebRequest.Create("url"); request.Method = "HEAD";
回答1:
Great idea, but a HEAD request only returns the document's HTTP headers. This does not include the title element, which is part of the HTTP message body.
回答2:
Try this:
using System; using System.Collections.Generic; using System.Linq; using System.Text; using System.Net; using System.IO; using System.Text.RegularExpressions; namespace ConsoleApplication2 { class Program { static void Main(string[] args) { string page = @"http://stackoverflow.com/"; HttpWebRequest req = (HttpWebRequest)HttpWebRequest.Create(page); StreamReader SR = new StreamReader(req.GetResponse().GetResponseStream()); Char[] buf = new Char[256]; int count = SR.Read(buf, 0, 256); while (count > 0) { String outputData = new String(buf, 0, count); Match match = Regex.Match(outputData, @"<title>([^<]+)", RegexOptions.IgnoreCase); if (match.Success) { Console.WriteLine(match.Groups[1].Value); } count = SR.Read(buf, 0, 256); } } } }
回答3:
If you don't want to request the entire page, you can request it in pieces. The http spec defines a http header called Range. You would use it like below:
Range: bytes=0-100
You can look through the returned content and find the title. If it is not there, then request Range: 101-200 and so on until you get what you need.
Obviously, the web server needs to support range, so this may be hit or miss.
回答4:
So I would have to go with something like...
HttpWebRequest req = (HttpWebRequest)WebRequest.Create(URL); HttpWebResponse resp = (HttpWebResponse)req.GetResponse(); Stream st = resp.GetResponseStream(); StreamReader sr = new StreamReader(st); string buffer = sr.ReadToEnd(); int startPos, endPos; startPos = buffer.IndexOf("<title>", StringComparison.CurrentCultureIgnoreCase) + 7; endPos = buffer.IndexOf("</title>", StringComparison.CurrentCultureIgnoreCase); string title = buffer.Substring(startPos, endPos - startPos); Console.WriteLine("Response code from {0}: {1}", s, resp.StatusCode); Console.WriteLine("Page title: {0}", title); sr.Close(); st.Close();