Good graph traversal algorithm

前端未结

关注

 4  1719

终归单人心 2020-12-16 18:21

Abstract problem : I have a graph of about 250,000 nodes and the average connectivity is around 10. Finding a node\'s connections is a long process (10 seconds lets say). Sa

4条回答

抹茶落季 (楼主)

2020-12-16 18:56

I am really confused as to why it takes 10 seconds to add a node to the DB. That sounds like a problem. What database are you using? Do you have severe platform restrictions?

With modern systems, and their oodles of memory, I would recommend a nice simple cache of some kind. You should be able to create a very quick cache of user information that would allow you to avoid repeated work. When you have encountered a node already, stop processing. This will avoid cycling forever in cliques.

If you need to allow for rehashing existing nodes after a while, you can use a last_visit_number which would be a global value in the dB. If the node has that number, then this crawl is the one that encountered it. If you want to automatically revisit any nodes, you just need to bump the last_visit_number before starting the crawl.

By your description, I am not quite sure how you are getting stuck.

Edit ------ I just noticed you had a concrete question. In order to increase how quickly you pull in new data, I would keep track of the number of times a given user was linked to in your data (imported or not yet imported). When choosing a user to crawl, I would pick users that have a low number of links. I would specifically go for either the lowest number of links or a random choice among the users with the lowest number of links.

Jacob

0 讨论(0)

查看其它4个回答
发布评论:

提交评论
- 加载中...