Create new TCP Connections for every HTTP request in python

问题

For my college project I am trying to develop a python based traffic generator.I have created 2 CentOS machines on vmware and I am using 1 as my client and 1 as my server machine. I have used IP aliasing technique to increase number of clients and severs using just single client/server machine. Upto now I have created 50 IP alias on my client machine and 10 IP alias on my server machine. I am also using multiprocessing module to generate traffic concurrently from all 50 clients to all 10 servers. I have also developed few profiles(1kb,10kb,50kb,100kb,500kb,1mb) on my server(in /var/www/html directory since I am using Apache Server) and I am using urllib2 to send request to these profiles from my client machine. Here while running my scripts when I monitor number of TCP Connections it is always <50. I want to increase it to say 10000. How do I achieve this? I thought that if a new TCP Connection is established for every new http request, then this goal can be achieved. Am I on right path? If not kindly guide to me correct path.

        '''
Traffic Generator Script:

 Here I have used IP Aliasing to create multiple clients on single vm machine. 
 Same I have done on server side to create multiple servers. I have around 50 clients and 10 servers
'''
import multiprocessing
import urllib2
import random
import myurllist    #list of all destination urls for all 10 servers
import time
import socbindtry   #script that binds various virtual/aliased client ips to the script
response_time=[]    #some shared variables
error_count=multiprocessing.Value('i',0)
def send_request3():    #function to send requests from alias client ip 1
    opener=urllib2.build_opener(socbindtry.BindableHTTPHandler3)    #bind to alias client ip1
    try:
    tstart=time.time()
    for i in range(myurllist.url):
    x=random.choice(myurllist.url[i])
    opener.open(x).read()
    print "file downloaded:",x
    response_time.append(time.time()-tstart)
    except urllib2.URLError, e:
    error_count.value=error_count.value+1
def send_request4():    #function to send requests from alias client ip 2
    opener=urllib2.build_opener(socbindtry.BindableHTTPHandler4)    #bind to alias client ip2
    try:
    tstart=time.time()
    for i in range(myurllist.url):
    x=random.choice(myurllist.url[i])
    opener.open(x).read()
    print "file downloaded:",x
    response_time.append(time.time()-tstart)
    except urllib2.URLError, e:
    error_count.value=error_count.value+1
#50 such functions are defined here for 50 clients
process=[]
def func():
    global process
    process.append(multiprocessing.Process(target=send_request3))
    process.append(multiprocessing.Process(target=send_request4))
    process.append(multiprocessing.Process(target=send_request5))
    process.append(multiprocessing.Process(target=send_request6))
#append 50 functions here
    for i in range(len(process)):
     process[i].start()
    for i in range(len(process)):
     process[i].join()
    print"All work Done..!!"
     return
start=float(time.time())
func()
end=float(time.time())-start
print end

回答1:

For this sort of things, you probably need to create a pool of worker process. I don't know if a pool of 10000 process is viable in your use case (it is a very ambitious goal), but you should definitively investigate that idea.

The basic idea behind a pool is that you have M tasks to perform, with a maximum of N running simultaneously. When one of the worker has finished its task, it is ready to work on an other until all the work is done. One major advantage is that if some number of tasks take long time to complete, they will not block the overall progress of the work (as long as the number of "slow" process is < N).

Along the lines, here would be the basic structure of your program Using Pool:

from multiprocessing import Pool

import time
import random

def send_request(some_parameter):
    print("Do send_request", some_parameter)

    time.sleep(random.randint(1,10)) # simulate randomly long process

if __name__ == '__main__':
    pool = Pool(processes=100)

    for i in range(200):
        pool.apply_async(send_request, [i])


    print("Waiting")
    pool.close()
    pool.join()
    print("Done")

On my system, this sample program took something like 19s (real time) to perform. On my Debian system, I was only able to spawn a little bit more than 1000 processes at a time before I reached the maximum number of open file (given the standard ulimit -n of 1024). You will have to somehow raise that limit if you need such a huge number of working threads. And even if doing so, as I said firstly 10000 concurrent process is probably rather ambitious (at least using Python).

来源：https://stackoverflow.com/questions/29296557/create-new-tcp-connections-for-every-http-request-in-python

标签

python

CentOS

urllib2

python-multiprocessing