When trying to extract the title of a html-page I have always used the following regex:
(?<=)([\\s\\S]*)(?=)
<
Toss out the idea of parsing HTML with regular expressions and use an actual HTML parsing library instead. After a quick search I found this one. It's a much safer way to extract information from an HTML file.
Remember, HTML is not a regular language so regular expressions are fundamentally the wrong tool for extracting information from it.