Google Colaboratory: misleading information about its GPU (only 5% RAM available to some users)

老子叫甜甜 提交于 2019-11-28 14:58:52

So to prevent another dozen of answers suggesting invalid in the context of this thread suggestion to !kill -9 -1, let's close this thread:

The answer is simple:

As of this writing Google simply gives only 5% of GPU to some of us, whereas 100% to the others. Period.

mar-2019 update: A year later Google finally noticed this thread and sent @AmiF to discredit it, implying that everybody who has this problem is an incompetent user who can't figure out how to reset their runtime to recover memory. @AmiF further suggests that perhaps this problem was just a bug in their code and that we the users can't tell a company policy vs a bug.

Unfortunately, no full disclosure is being made and we are only left with our guesses to what might really be going on. Clearly, a for-profit company will have reservations to who they are nice to and thus it's impossible to avoid discrimination here. It makes total sense and it's very logical. Since this resource is provided for free, we can't really complain, but are only asking why some of us are being blacklisted, whereas others coming from otherwise identical setups/locales are not.

Since my personal account got removed from the blacklist in dec-2018 (see my update below) I can only rely on other users who are still on the blacklist to help speak the truth. As I'm writing this update this thread got yet another upvote.

That's said let's hope that Google will come around ending the blacklisting at least for those who ask to be removed from it. Most of us haven't done any incriminating activities to be on such list and simply got caught by immature machine learning brains and are given no chance to prove ourselves not guilty. @AmyF suggested to report this problem at http://github.com/googlecolab/colabtools/issues - if you report the problem and get brushed away with your ticket closed with no investigation like in this case, please post the link to your unresolved issue in the comments of this answer, so that we could ask for some accountability.

And, of course, before you upvote this thread, please perform "Reset all runtimes" in the Runtime menu in colab and see whether perhaps you indeed had the issue of unfinished notebooks that still consume GPU RAM and you're not at all impacted by the blacklisting policy.

Once upvoting stops we will know that this discrimination policy has been abolished. Unfortunately, as of this update, this is not the case, which renders @AmyF's comments below highly dubious.

dec-2018 update: I have a theory that Google may have a blacklist of certain accounts, or perhaps browser fingerprints, when its robots detect a non-standard behavior. It could be a total coincidence, but for quite some time I had an issue with Google Re-captcha on any website that happened to require it, where I'd have to go through dozens of puzzles before I'd be allowed through, often taking me 10+ min to accomplish. This lasted for many months. All of a sudden as of this month I get no puzzles at all and any google re-captcha gets resolved with just a single mouse click, as it used to be almost a year ago.

And why I'm telling this story? Well, because at the same time I was given 100% of the GPU RAM on Colab. That's why my suspicion is that if you are on a theoretical google black list then you aren't being trusted to be given a lot of resources for free. I wonder if any of you find the same correlation between the limited GPU access and the Re-captcha nightmare. As I said, it could be totally a coincidence as well.

Last night I ran your snippet and got exactly what you got:

Gen RAM Free: 11.6 GB  | Proc size: 666.0 MB
GPU RAM Free: 566MB | Used: 10873MB | Util  95% | Total 11439MB

but today:

Gen RAM Free: 12.2 GB  I Proc size: 131.5 MB
GPU RAM Free: 11439MB | Used: 0MB | Util   0% | Total 11439MB

I think the most probable reason is the GPUs are shared among VMs, so each time you restart the runtime you have chance to switch the GPU, and there is also probability you switch to one that is being used by other users.

UPDATED: It turns out that I can use GPU normally even when the GPU RAM Free is 504 MB, which I thought as the cause of ResourceExhaustedError I got last night.

If you execute a cell that just has
!kill -9 -1
in it, that'll cause all of your runtime's state (including memory, filesystem, and GPU) to be wiped clean and restarted. Wait 30-60s and press the CONNECT button at the top-right to reconnect.

Misleading description on the part of Google. I got too excited about it too, I guess. Set everything up, loaded the data, and now I am not able to do anything with it due to having only 500Mb memory allocated to my Notebook.

Find the Python3 pid and kill the pid. Please see the below image

Note: kill only python3(pid=130) not jupyter python(122).

Restart Jupyter IPython Kernel:

!pkill -9 -f ipykernel_launcher

Im not sure if this blacklisting is true! Its rather possible, that the cores are shared among users. I ran also the test, and my results are the following:

Gen RAM Free: 12.9 GB | Proc size: 142.8 MB GPU RAM Free: 11441MB | Used: 0MB | Util 0% | Total 11441MB

It seems im getting also full core. However i ran it a few times, and i got the same result. Maybe i will repeat this check a few times during the day to see if there is any change.

I believe if we have multiple notebooks open. Just closing it doesn't actually stop the process. I haven't figured out how to stop it. But I used top to find PID of the python3 that was running longest and using most of the memory and I killed it. Everything back to normal now.

Siddarthan
!pkill -9 -f ipykernel_launcher

This freed up the space

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!