Dijkstra algorithm optimization/caching

匆匆过客 提交于 2019-11-29 16:12:17

Micro-optimisations, but don't do:

for ($p = 0; $p < sizeof($nextStopArray); $p++) { 
   ...
}

calculate the sizeof($nextStopArray) before the loop, otherwise you're doing the count every iteration (and this value isn't being changed)

$nextStopArraySize = sizeof($nextStopArray);
for ($p = 0; $p < $nextStopArraySize; ++$p) { 
   ...
}

There's a couple of places where this should be changed.

And if you're iterating several thousand times, ++$p is faster than $p++

But profile the function... find out which parts are taking the longest to execute, and look to optimise those.

EDIT

Get rid of array_push_key as a function, simply execute it inline... it's costing you an unnecessary function call otherwise

Build an array of all nodes from your database outside of the while(true) loop... retrieve all the data in a single SQL query and build a lookup array.

Replacing

for ($p = 0; $p < sizeof($nextStopArray); $p++) { 

with

$nextStopArraySize = sizeof($nextStopArray);
$p = -1
while (++$p < $nextStopArraySize) { 
   ...
}

may also prove faster still (just check that the logic does loop through the correct number of times).

At a glance (you should really do some profiling, by the way), the culprit is the fact that you are executing a query for each graph node to find its neighbors:

$result=mysql_query("SELECT route_id, next_stop FROM db_stop_times WHERE stop_id = $activeNode", $connection);

If you have 1,700 nodes this is going to issue on the order of a thousand queries. Rather than hitting the database so often, cache these database results in something like memcached, and only fall back to the database on cache misses.

it's using too much resources

Which resource? (CPU? Memory? Network bandwidth? I/O load on the database server?)

while (true) {       
    $result=mysql_query("SELECT route_id, next_stop FROM db_stop_times WHERE stop_id = $activeNode", $connection);

If I am reading this right you are doing a database call for every node in every pathfinding attempt. Each of these calls will block for a little while waiting for the response from the database. Even if you have a fast database, that's bound to take a couple milliseconds (unless the database is running on the same server as your code). So I'd venture the guess that most of your execution time is spent waiting for replies from the database.

Moreover, should your database lack proper indexes, each query could do a full table scan ...

The solution is straightforward: Load db_stop_times into memory at application startup, and use that in-memory representation when resolving neighbour nodes.

Edit: Yes, an index on stop_id would be a proper index for this query. As for practical caching, I don't know PHP, but with something like Java (or C#, or C++, or even C) I'd use a representation of the form

class Node {
    Link[] links;
}

class Link {
    int time;
    Node destination;
}

that would be a bit faster than memcached, but assumes you can comfortably fit the entire table in main memory. If you can't do that, I'd use a caching system like memcached.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!