I have made a function that finds all the URLs within an html file and repeats the same process for each html content linked to the discovered URLs. The function is recursiv
Rather than going for a recursive function calls, work with a queue model to flatten the structure.
$queue = array('http://example.com/first/url');
while (count($queue)) {
$url = array_shift($queue);
$queue = array_merge($queue, find_urls($url));
}
function find_urls($url)
{
$urls = array();
// Some logic filling the variable
return $urls;
}
There are different ways to handle it. You can keep track of more information if you need some insight about the origin or paths traversed. There are also distributed queues that can work off a similar model.