问题
I am trying to train my Deep Learning model on Google colab where they offer a free K80 GPU. I learned that it can be used for 12 hours at a time and then you have to reconnect to it. But my connection is lost after 10-15 minutes and I cannot reconnect (it stays stuck on Initializing) . What's the issue here ?
回答1:
I have been able to running a vision training model and it disconnects and stops sometime overnight. It runs hours and may be 12 hours. I also trained the model using the CPU and got the same results although without as many epochs completed. I have searched to see what the time limit is for the CPU without success. The training program uses tensorflow.saver to use checkpoints during training that allow for restarting training from a checkpoint when it is disrupted.
回答2:
This proved to be a network issue in my university. My university has a login portal to access the internet. Bypassing it solved the problem.
来源:https://stackoverflow.com/questions/49611472/google-colaboratory-disconnects-after-10-15-minutes