The DOI system places basically no useful limitations on what constitutes a reasonable identifier. However, being able to pull DOIs out of PDFs, web pages, etc. is quite useful
The following regex should do the job (Perl regex syntax):
/(10\.\d+\/\d+)/
You could do some additional sanity checking by opening the urls
http://hdl.handle.net/
and
http://dx.doi.org/
where is the candidate doi,
and testing that you a) get a 200 OK http status, and b) the returned page is not the "DOI not found" page for the service.