问题
I want to record all used ports within huge pcaps. There are 65535 ports available, and each port is able to talk each other port: 65535 x 65535 links in total
The matrix will be very sparse (many 0 entries). Additionally, I think the edges don't have to be directed, so Port1->Port2 may be added to Port2->Port1 (which reduces our amount of values to 65535 * 65536 / 2). How would you store this using python? In numpy? What will be the estimated amount of memory consumption for this?
Afterwards, I want to find the greatest sum for one port and pop() it (the whole row and column while). This means, i want to find e.g. that Port1 was used 500 times (100 times from Port2 to Port1, 300 times from Port3 to Port1, Port4 to Port1 100times)...
Graphically spoken, I want to have 65535 nodes that could be connected with each other. Then I want to find the node that has the highest sum of values on connected edges. Afterwards, I want to pop the node (and delete the corresponding edges, which will decrease the sum of other nodes).
Thanks!
回答1:
In Python, and depending on how sparse is sparse, a dict-of-dicts will handle this quite well.
connections = { ..., 8080: { 4545:17, 20151:3, ...}, ...}
If I have understood what you are doing correctly, then the count of connections to port p is
count = sum( connections[8080].values() )
removing port p is
del connections[p]
for conn in connections.values(): # edit, bug fixed.
if p in conn:
del conn[p]
If you want to try to save memory by storing only half the pairs, then simplicity suffers greatly.
回答2:
Look into the adjacency list representation of Graph, it will most probably suits your needs.
However, a graph containing 65535 vertices is not that big. Even if you cannot represent it with a simple matrix.
The memory consumption is O(E+V) with V number of vertices (65535) and E number of edges (on a sparse graph, it has the same magnitude order than V).
来源:https://stackoverflow.com/questions/37271660/big-graph-in-memory