Question:
I\'ve profiled my Python program to death, and there is one function that is slowing everything down. It uses Python dictionaries heavily, so
This would require a fair amount of work, but...you might consider using Floyd-Warshall running on a GPU. There has been a lot of work done on making Floyd-Warshall run very efficiently on a GPU. A quick google search yields:
http://cvit.iiit.ac.in/papers/Pawan07accelerating.pdf
http://my.safaribooksonline.com/book/programming/graphics/9780321545411/gpu-computing-for-protein-structure-prediction/ch43lev1sec2#X2ludGVybmFsX0ZsYXNoUmVhZGVyP3htbGlkPTk3ODAzMjE1NDU0MTEvNDg3
http://www.gpucomputing.net/?q=node/1203
http://http.developer.nvidia.com/GPUGems2/gpugems2_chapter43.html
Even though, as implemented in Python, Floyd-Warshall was slower by an order of magnitude, a good GPU version on a powerful GPU might still significantly outperform your new Python code.
Here's an anecdote. I had a short, simple, compute-intensive piece of code that did something similar to a hough accumulation. In Python, optimized as I could get it, it took ~7s on a speedy i7. I then wrote a completely non-optimized GPU version; it took ~0.002s on an Nvidia GTX 480. YMMV, but for anything significantly parallel, the GPU is likely to be a long term winner, and since it's a well-studied algorithm, you should be able to utilize existing highly-tuned code.
For the Python / GPU bridge, I'd recommend PyCUDA or PyOpenCL.