I\'m looking for a method that will allow me to get the title of a webpage and store it as a string.
However all the solutions I have found so far involve downloadin
As the tag is in the HTML itself, there will be no way to not download the file to find "just the title". You should be able download a portion of the file until you've read in the tag, or the tag and then stop, but you'll still need to download (at least a portion of) the file.
This can be accomplished with HttpWebRequest/HttpWebResponse and reading in data from the response stream until we've either read in a block, or the tag. I added the tag check because, in valid HTML, the title block must appear within the head block - so, with this check we will never parse the entire file in any case (unless there is no head block, of course).
The following should be able to accomplish this task:
string title = "";
try {
HttpWebRequest request = (HttpWebRequest.Create(url) as HttpWebRequest);
HttpWebResponse response = (request.GetResponse() as HttpWebResponse);
using (Stream stream = response.GetResponseStream()) {
// compiled regex to check for block
Regex titleCheck = new Regex(@"\s*(.+?)\s* ", RegexOptions.Compiled | RegexOptions.IgnoreCase);
int bytesToRead = 8092;
byte[] buffer = new byte[bytesToRead];
string contents = "";
int length = 0;
while ((length = stream.Read(buffer, 0, bytesToRead)) > 0) {
// convert the byte-array to a string and add it to the rest of the
// contents that have been downloaded so far
contents += Encoding.UTF8.GetString(buffer, 0, length);
Match m = titleCheck.Match(contents);
if (m.Success) {
// we found a match =]
title = m.Groups[1].Value.ToString();
break;
} else if (contents.Contains("")) {
// reached end of head-block; no title found =[
break;
}
}
}
} catch (Exception e) {
Console.WriteLine(e);
}
UPDATE: Updated the original source-example to use a compiled Regex and a using statement for the Stream for better efficiency and maintainability.