Preg_match_all <a

后端未结

关注

 6  1823

天命终不由人

Hello i want to extract links and i want a regex which givs me /portal/clients/show/entityId/2121 the nu

相关标签:

6条回答

既然无缘

2020-12-02 03:07
Regex for parsing links is something like this:
```
'/<a\s+(?:[^"'>]+|"[^"]*"|'[^']*')*href=("[^"]+"|'[^']+'|[^<>\s]+)/i'
```
Given how horrible that is, I would recommend using Simple HTML Dom for getting the links at least. You could then check links using some very basic regex on the link href.
0 讨论(0)
发布评论:

提交评论
- 加载中...

难免孤独

2020-12-02 03:13

Don't use regular expressions for proccessing xml/html. This can be done very easily using the builtin dom parser:

$doc = new DOMDocument();
$doc->loadHTML($htmlAsString);
$xpath = new DOMXPath($doc);
$nodeList = $xpath->query('//a/@href');
for ($i = 0; $i < $nodeList->length; $i++) {
    # Xpath query for attributes gives a NodeList containing DOMAttr objects.
    # http://php.net/manual/en/class.domattr.php
    echo $nodeList->item($i)->value . "<br/>\n";
}

0 讨论(0)

面向向阳花

2020-12-02 03:14

Simple PHP HTML Dom Parser example:

// Create DOM from string
$html = str_get_html($links);

//or
$html = file_get_html('www.example.com');

foreach($html->find('a') as $link) {
    echo $link->href . '<br />';
}

0 讨论(0)

鱼传尺愫

2020-12-02 03:22

This is my solution:

<?php
// get links
$website = file_get_contents("http://www.example.com"); // download contents of www.example.com
preg_match_all("<a href=\x22(.+?)\x22>", $website, $matches); // save all links \x22 = "

// delete redundant parts
$matches = str_replace("a href=", "", $matches); // remove a href=
$matches = str_replace("\"", "", $matches); // remove "

// output all matches
print_r($matches[1]);
?>

I recommend to avoid using xml-based parsers, because you will not always know, whether the document/website has been well formed.

Best regards

0 讨论(0)

花落未央

2020-12-02 03:22

Paring links from HTML can be done using am HTML parser.

When you have all links, simple get the index of the last forward slash, and you have your number. No regex needed.

0 讨论(0)
发布评论:

提交评论
- 加载中...
情歌与酒

2020-12-02 03:26

When "parsing" html I mostly rely on PHPQuery: http://code.google.com/p/phpquery/ rather then regex.

0 讨论(0)
发布评论:

提交评论
- 加载中...