Regular expression to get an attribute from HTML tag

前端 未结 4 1710
难免孤独
难免孤独 2020-12-03 03:55

I am looking for a regular expression that can get me src (case insensitive) tag from following HTML snippets in java.



        
4条回答
  •  清歌不尽
    2020-12-03 04:08

    One possibility:

    String imgRegex = "]+src\\s*=\\s*['\"]([^'\"]+)['\"][^>]*>";
    

    is a possibility (if matched case-insensitively). It's a bit of a mess, and deliberately ignores the case where quotes aren't used. To represent it without worrying about string escapes:

    ]+src\s*=\s*['"]([^'"]+)['"][^>]*>
    

    This matches:

    • one or more characters that aren't > (i.e. possible other attributes)
    • src
    • optional whitespace
    • =
    • optional whitespace
    • starting delimiter of ' or "
    • image source (which may not include a single or double quote)
    • ending delimiter
    • although the expression can stop here, I then added:
      • zero or more characters that are not > (more possible attributes)
      • > to close the tag

    Things to note:

    • If you want to include the src= as well, move the open bracket further left :-)
    • This does not care about delimiter balancing or attribute values without delimiters, and it can also choke on badly-formed attributes (such as attributes that include > or image sources that include ' or ").
    • Parsing HTML with regular expressions like this is non-trivial, and at best a quick hack that works in the majority of cases.

提交回复
热议问题