Finding a DOI in a document or page

前端 未结 7 1963
悲&欢浪女
悲&欢浪女 2021-01-29 21:43

The DOI system places basically no useful limitations on what constitutes a reasonable identifier. However, being able to pull DOIs out of PDFs, web pages, etc. is quite useful

7条回答
  •  刺人心
    刺人心 (楼主)
    2021-01-29 22:17

    The following regex should do the job (Perl regex syntax):

    /(10\.\d+\/\d+)/
    

    You could do some additional sanity checking by opening the urls

    http://hdl.handle.net/
    

    and

    http://dx.doi.org/
    

    where is the candidate doi,

    and testing that you a) get a 200 OK http status, and b) the returned page is not the "DOI not found" page for the service.

提交回复
热议问题