Python requests module GET/POST + various REST clients' requests taking longer than curl

问题

Basically, I have a url that I'm hitting to get some XML data. The endpoint I cannot disclose, but doing:

curl -v "http://my-url.com/some/endpoint"

returns a 200 OK and the content pretty much instantly.

Using the requests module by Kenneth Reitz, I have both a POST request and a GET request that both take 30 seconds to return content.

If I use it this way:

from timeit import Timer

t = Timer(lambda: requests.get(myurl).content)
print t.timeit(number=1)
30.2136261463

it takes 30.2 sec on average each time. Same with my POST request. If I don't ask for content and just the status_code response, I get the same situation, unless if I pass the stream=True, where I get the response quickly, but not the content.

My confusion is within the curl command... I get both the response and content in under 10ms. I tried faking the user-agent in my python test, tried passing numerous arguments to the get() function etc. There must be some major difference between how curl and python-requests do requests that I am not aware of. I am a newbie, so I do apologise if I am missing something obvious.

I would also like to mention that I have tried multiple machines for this, multiple version of curl, python and even tried some REST clients like Postman etc. Only curl performs lightning fast - hitting the same endpoint in every case BTW. I understand one of the options is to do a subprocess call to curl within my test, but... Is that a good idea?

EDIT: I care about the content. I am aware I can get the response code quickly (headers).

Thanks in advance,

Tihomir.

UPDATE:

I am now using pycurl2 in my test, so this is just a workaround as I was hoping I could use python-requests for everything. Still curious as to why is curl so much faster.

回答1:

Since this question is not generating any interest at all, I am going to accept my own workaround solution - which involves using pycurl2 instead of requests for the problematic requests.

Only 2 of all of them are slow, and doing this fixed my issue, but it's not a solution I was hoping for.

NOTE: I am not saying in any way that requests is slow or bad. This seemed to be an issue with gzip compression and GlassFish serving gzipped data with a buggy length. I just wanted to know why it's not affecting curl/wget.

回答2:

One thing to do would be to use:

requests.get(url, stream=False)

instead of what you've posted. See this link for more:

http://docs.python-requests.org/en/latest/user/advanced/

DISCUSSION

Curl is an executable.
Python is an interpreted language.

As a result, Python has a much slower "startup" time than curl, which contributes to it's relatively slow speed despite the fact that IO is CPU bound. This is one of the trade-offs of using an interpreted language. But, generally while you get a relatively slow execution, the development and maintenance time far outweighs that "loss". (Note: I said generally).

One possible solution is as you say, to use Python to wrap curl in a script - which is not a bad idea, but can lead to disastrous problems (depending on usage, say deleting files), without care as there are race conditions to consider.

Another approach is to try and decompose the original Python code into a language like C/C++, so you can compile it and get near equivalent performance that you desire. Examples are using shedSkin and Cython.

来源：https://stackoverflow.com/questions/18996177/python-requests-module-get-post-various-rest-clients-requests-taking-longer-t

标签

python

curl

httprequest

python-requests