Regex to return all attributes of a web page that starts by a specific value

好久不见. 提交于 2019-12-13 11:21:29

问题


The question is simple, I need to get the value of all attributes whose value starts withhttp://example.com/api/v3?. For example, if a page contains

<iframe src="http://example.com/api/v3?download=example%2Forg">
<meta twitter="http://example.com/api/v3?return_to=%2F">

Then I should get an array/list with 2 member :http://example.com/api/v3?return_to=%2Fandhttp://example.com/api/v3?download=example%2Forg (the order doesn’t matter).

I don’t want the elements, just the attribute’s value.
Basically I need the regex that returns strings starting with http://example.com/api/v3?and ending with a space.


回答1:


A regular expression would likely look like this:

/http:\/\/example\.com\/api\/v3\?\S+/g

Make sure to escape each / and ? with a backslash. \S+ yields all subsequent non-space characters. You can also try [^\s"]+ instead of \S if you also want to exclude quote marks.

In my experience, though, regexes are usually slower than working on already parsed objects directly, so I’d recommend you try these Array and DOM functions instead:

Get all elements, map them to their attributes and filter those that start with http://example.com/api/v3?, reduce all attributes lists to one Array and map those attributes to their values.

Array.from(document.querySelectorAll("*"))
  .map(elem => Object.values(elem.attributes)
  .filter(attr => attr.value.startsWith("http://example.com/api/v3?")))
  .reduce((list, attrList) => list.concat(attrList), [])
  .map(attr => attr.value);

You can find polyfills for ES6 and ES5 functions and can use Babel or related tools to convert the code to ES5 (or replace the arrow functions by hand).




回答2:


There is the CSS selector * meaning "any element".

There is no CSS selector meaning "any attribute with this value". Attribute names are arbitrary. While there are several attributes defined in the HTML specs, it's possible to use custom ones like the twitter attribute in your example. This means you'll have to iterate over all the attributes on a given element.

With out a global attribute value selector, you will need to manually iterate over all elements and values. It may be possible for you to determine some heuristics to help narrow down your search before going brute force.



来源:https://stackoverflow.com/questions/39822557/regex-to-return-all-attributes-of-a-web-page-that-starts-by-a-specific-value

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!