Why multi-threaded python program slow on ec2 micro-instance?

佐手、 提交于 2021-02-16 09:23:28

问题


I am working on a Online Judge code checker.My code uses multi-threading in python 2.7.The same program on my local machine (i core 3 RAM 4GB) evaluates about 1000 submisions in 1 minute 10 seconds. But when I run it on ec2 micro instance(about 600 MB RAM) it takes about 40 minutes(It gets slow for some random seconds).To know the reason I broke down things.

  1. First this is how my evaluator works:

    • I have a main program worker.py , which creates multiple threads
    • The main thread pulls submissions(10 at a time) from a file(for time being) and puts them in a global queue
    • The side threads take submisions from queue(one submission evaluated solely by one thread)
    • After a side thread takes a submission it sends it to a function compile,which returns the executable of submission back to that thread
    • Then the thread sends this executable to a function run which runs the executable (using sandbox with defined memory and time limits) and writes the output of the executable to file and then checks it
      against standard output
    • After the queue gets empty the main thread again pulls 10 submissions and places them in queue

  2. The functions compile and run:

    • The compile function and run function save the executable and output in files(repectively) named like <thread_Name>.exe and <thread_Name>.txt so that every thread has its own files and there is no issue of overwriting.
    • A thread goes to run function only if status from compile function was OK(the file compiled)otherwise throws compile error for that submission

  3. Now the doubts I have:

    • Is the problem of slow execution on ec2 due to the resources it has or due to multi-threading of python.In my scripts the threads to access global variables such as the queue(i put locks) and test.py(I dont put lock on it) which in run function checks the output with standard output character by character(vimdiff like), and mysandbox.py(libsandbox the sandbox) and some other global variables.So is the slow working due to GIL of python.If it is so then why does it work fast on my local machine.
    • Also for time being I give the same file test.cpp(adds two numbers and prints result) 1000 times.So when I purposely make a compile error in this file and run my main program on ec2 it runs pretty fast.From that I deduced that the compiling and and running(compile and run functions) of my program take the main time,not the thread creation and management.

I know its a vast question but any help is really appreciated(or i will have to keep bounty on it betting all my reputation :) ).


回答1:


Micro instances become extremely slow for sustained computational tasks (by design).

You wrote your code to be multi-threaded to take advantage of the entire "machine's" CPU resources for tasks like file retrieval and compilation, which is good practice for performance.

And while this makes sense on a physical machine or a virtual machines where you have guaranteed provisioned hardware resources, it doesn't makes sense on a micro instance due to the way resources are allocated by Amazon.

Per Amazon's documentation, micro instances are designed for short-burst CPU operations only and will therefore experience huge bottlenecks imposed by Amazon itself if you try to use multiple threads that eat CPU usage:

If the application consumes more than your instance's allotted CPU resources, we temporarily limit the instance so it operates at a low CPU level. If your instance continues to use all of its allotted resources, its performance will degrade. We will increase the time that we limit its CPU level, thus increasing the time before the instance is allowed to burst again.

Take a look at the CPU usage graphs in the documentation I just linked to to get more details.

To prove that this is the issue, you could simply launch a small instance instead and run your judge software there - you should experience a dramatic improvement similar to your desktop machine.

TL;DR When trying to use sustained CPU on a micro instance, it can become less powerful than an old Palm Treo.



来源:https://stackoverflow.com/questions/17476402/why-multi-threaded-python-program-slow-on-ec2-micro-instance

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!