I\'m writing optimised Floyd-Warshall algorithm on GPU using numba. I need it to work in a few seconds in case of 10k matricies. Right now the processing is done in around 6