I heard threading is not very efficient in Python (compared to other languages).
Is this true? If so, how can a Python programmer overcome this?
CPython uses reference counting with a cyclic garbage collector for memory management. To make this practical, it has a mechanism called the "global interpreter lock" which protects the reference counting system, along with all the other interpreter internals.
On a single-core machine, this doesn't matter - all threading is faked via time-slicing, anyway. On a multiple core machine, it makes a difference: a CPU bound Python program running on CPython won't make use of all of the available cores.
There are a number of possible responses to this:
If threads are being used to convert blocking IO to a non-blocking operation, then that works just fine in standard CPython without any special modifications - IO operations already release the GIL.