Web scraping urlopen in python

后端 未结 3 741
小蘑菇
小蘑菇 2021-01-06 07:09

I am trying to get the data from this website: http://www.boursorama.com/includes/cours/last_transactions.phtml?symbole=1xEURUS

It seems like urlopen don\'t get the

3条回答
  •  无人及你
    2021-01-06 07:29

    Personally , I write:

    # Python 2.7
    
    import urllib
    
    url = 'http://www.boursorama.com/includes/cours/last_transactions.phtml?symbole=1xEURUS'
    sock = urllib.urlopen(url)
    content = sock.read() 
    sock.close()
    
    print content
    

    Et si tu parles français,.. bonjour sur stackoverflow.com !

    update 1

    In fact, I prefer now to employ the following code, because it is faster:

    # Python 2.7
    
    import httplib
    
    conn = httplib.HTTPConnection(host='www.boursorama.com',timeout=30)
    
    req = '/includes/cours/last_transactions.phtml?symbole=1xEURUS'
    
    try:
        conn.request('GET',req)
    except:
         print 'echec de connexion'
    
    content = conn.getresponse().read()
    
    print content
    

    Changing httplib to http.client in this code should be enough to adapt it to Python 3.

    .

    I confirm that, with these two codes, I obtain the source code in which I see the data in which you are interested:

            11:57:44
    
            1.4486
    
            0
    
    
    
                                            
    
            11:57:43
    
            1.4486
    
            0
    
    
    

    update 2

    Adding the following snippet to the above code will allow you to extract the data I suppose you want:

    for i,line in enumerate(content.splitlines(True)):
        print str(i)+' '+repr(line)
    
    print '\n\n'
    
    
    import re
    
    regx = re.compile('\t\t\t\t\t\t(\d\d:\d\d:\d\d)\r\n'
                      '\t\t\t\t\t\t([\d.]+)\r\n'
                      '\t\t\t\t\t\t(\d+)\r\n')
    
    print regx.findall(content)
    

    result (only the end)

    .......................................
    .......................................
    .......................................
    .......................................
    98 'window.config.graphics = {};\n'
    99 'window.config.accordions = {};\n'
    100 '\n'
    101 "window.addEvent('domready', function(){\n"
    102 '});\n'
    103 '\n'
    104 '
    \n' 114 '\n' 128 '\n' 129 '' [('12:25:36', '1.4478', '0'), ('12:25:33', '1.4478', '0'), ('12:25:31', '1.4478', '0'), ('12:25:30', '1.4478', '0'), ('12:25:30', '1.4478', '0'), ('12:25:29', '1.4478', '0')]

    I hope you don't plan to "play" trading on the Forex: it's one of the best way to loose money rapidly.

    update 3

    SORRY ! I forgot you are with Python 3. So I think you must define the regex like that:

    regx = re.compile(b'\t\t\t\t\t......)

    that is to say with b before the string, otherwise you'll get an error like in this question

提交回复
热议问题