In addition to other answers, one reason Python and most other multi-paradigm languages are not well suited for true functional programming is because their compilers / virtual machines / run-times do not support functional optimization. This sort of optimization is achieved by the compiler understanding mathematical rules. For example, many programming languages support a map function or method. This is a fairly standard function that takes a function as one argument and a iterable as the second argument then applies that function to each element in the iterable.
Anyways it turns out that map( foo() , x ) * map( foo(), y ) is the same as map( foo(), x * y ). The latter case is actually faster than the former because the former performs two copies where the latter performs one.
Better functional languages recognize these mathematically based relationships and automatically perform the optimization. Languages that aren't dedicated to the functional paradigm will likely not optimize.