I want to extract contents of title tag from html string. I have done some search but so far i am not able to find such code in VB/C# or PHP. Also this should work with both upper and lower case tags e.g. should work with both <title></title>
and <TITLE></TITLE>
. Thank you.
You can use regular expressions for this but it's not completely error-proof. It'll do if you just want something simple though (in PHP):
function get_title($html) {
return preg_match('!<title>(.*?)</title>!i', $html, $matches) ? $matches[1] : '';
}
Sounds like a job for a regular expression. This will depend on the HTML being well-formed, i.e., only finds the title element inside a head element.
Regex regex = new Regex( ".*<head>.*<title>(.*)</title>.*</head>.*",
RegexOptions.IgnoreCase );
Match match = regex.Match( html );
string title = match.Groups[0].Value;
I don't have my regex cheat sheet in front of me so it may need a little tweaking. Note that there is also no error checking in the case where no title element exists.
If there is any attribute in the title tag (which is unlikely but can happen) you need to update the expression as follows:
$title = preg_match('!<title.*>(.*?)</title>!i', $url_content, $matches) ? $matches[1] : '';
来源:https://stackoverflow.com/questions/717100/extract-title-tag-from-html