I hope this question is not a RTFM one.
I am trying to write a Python script that extracts links from a standard HTML webpage (the tags).
I hav
No there isn't.
You can consider using Beautiful Soup. You can call it the standard for parsing html files.
Shoudln't a link be a well-defined regex?
No, [X]HTML is not in the general case parseable with regex. Consider examples like:
<link title='hello">world' href="x">link</link>
<!-- <link href="x">not a link</link> -->
<![CDATA[ ><link href="x">not a link</link> ]]>
<script>document.write('<link href="x">not a link</link>')</script>
and that's just a few random valid examples; if you have to cope with real-world tag-soup HTML there are a million malformed possibilities.
If you know and can rely on the exact output format of the target page you can get away with regex. Otherwise it is completely the wrong choice for scraping web pages.