Check if a JavaScript string is a URL

前端 未结 30 3576
野趣味
野趣味 2020-11-22 15:41

Is there a way in JavaScript to check if a string is a URL?

RegExes are excluded because the URL is most likely written like stackoverflow; that is to s

30条回答
  •  生来不讨喜
    2020-11-22 16:30

    Mathias Bynens has compiled a list of well-known URL regexes with test URLs. There is little reason to write a new regular expression; just pick an existing one that suits you best.

    But the comparison table for those regexes also shows that it is next to impossible to do URL validation with a single regular expression. All of the regexes in Bynens' list produce false positives and false negatives.

    I suggest that you use an existing URL parser (for example new URL('http://www.example.com/') in JavaScript) and then apply the checks you want to perform against the parsed and normalized form of the URL resp. its components. Using the JavaScript URL interface has the additional benefit that it will only accept such URLs that are really accepted by the browser.

    You should also keep in mind that technically incorrect URLs may still work. For example http://w_w_w.example.com/, http://www..example.com/, http://123.example.com/ all have an invalid hostname part but every browser I know will try to open them without complaints, and when you specify IP addresses for those invalid names in /etc/hosts/ such URLs will even work but only on your computer.

    The question is, therefore, not so much whether a URL is valid, but rather which URLs work and should be allowed in a particular context.

    If you want to do URL validation there are a lot of details and edge cases that are easy to overlook:

    • URLs may contain credentials as in http://user:password@www.example.com/.
    • Port numbers must be in the range of 0-65535, but you may still want to exclude the wildcard port 0.
    • Port numbers may have leading zeros as in http://www.example.com:000080/.
    • IPv4 addresses are by no means restricted to 4 decimal integers in the range of 0-255. You can use one to four integers, and they can be decimal, octal or hexadecimal. The URLs https://010.010.000010.010/, https://0x8.0x8.0x0008.0x8/, https://8.8.2056/, https://8.526344/, https://134744072/ are all valid and just creative ways of writing https://8.8.8.8/.
    • Allowing loopback addresses (http://127.0.0.1/), private IP addresses (http://192.168.1.1), link-local addresses (http://169.254.100.200) and so on may have an impact on security or privacy. If, for instance, you allow them as the address of user avatars in a forum, you cause the users' browsers to send unsolicited network requests in their local network and in the internet of things such requests may cause funny and not so funny things to happen in your home.
    • For the same reasons, you may want to discard links to not fully qualified hostnames, in other words hostnames without a dot.
    • But hostnames may always have a trailing dot (like in http://www.stackoverflow.com.).
    • The hostname portion of a link may contain angle brackets for IPv6 addresses as in http://[::1].
    • IPv6 addresses also have ranges for private networks or link-local addresses etc.
    • If you block certain IPv4 addresses, keep in mind that for example https://127.0.0.1 and https://[::ffff:127.0.0.1] point to the same resource (if the loopback device of your machine is IPv6 ready).
    • The hostname portion of URLs may now contain Unicode, so that the character range [-0-9a-zA-z] is definitely no longer sufficient.
    • Many registries for top-level domains define specific restrictions, for example on the allowed set of Unicode characters. Or they subdivide their namespace (like co.uk and many others).
    • Top-level domains must not contain decimal digits, and the hyphen is not allowed unless for the IDN A-label prefix "xn--".
    • Unicode top-level domains (and their punycode encoding with "xn--") must still contain only letters but who wants to check that in a regex?

    Which of these limitations and rules apply is a question of project requirements and taste.

    I have recently written a URL validator for a web app that is suitable for user-supplied URLs in forums, social networks, or the like. Feel free to use it as a base for your own one:

    • JavaScript/Typescript version for the (Angular) frontend
    • Perl version for the backend

    I have also written a blog post The Gory Details of URL Validation with more in-depth information.

提交回复
热议问题