scrape google resultstats with python [closed]

╄→尐↘猪︶ㄣ 提交于 2019-12-01 18:28:27

If you haven't solved this problem yet, it looks like the reason BeautifulSoup is failing to find anything is that the resultStats never appear in the soup - your Request(page_google) is only returning JavaScript, not any search results that the JavaScript is dynamically loading in. You can verify this by adding a

print(soup)

command to your code and you will see that the resultStats div doesn't appear.

The following code:

import sys                                                                                                                                                                  
from urllib2 import Request, urlopen                                                                                                                                        
import urllib                                                                                                                                                               
from bs4 import BeautifulSoup                                                                                                                                               
query = 'pokerbonus'                                                                                                                                                        
url = "http://www.google.de/search?q=%s" % urllib.quote_plus(query)                                                                                                         
req_google = Request(url)                                                                                                                                                   
req_google.add_header('User-Agent', 'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-GB;    rv:1.9.0.3) Gecko/2008092417 Firefox/3.0.3')                                           
html_google = urlopen(req_google).read()                                                                                                                                    
soup = BeautifulSoup(html_google)                                                                                                                                           
scounttext = soup.find('div', id='resultStats')                                                                                                                             
print(scounttext)

Will print

<div class="sd" id="resultStats">Ungefähr 1.060.000 Ergebnisse</div>

Lastly, using a tool like Selenium Webdriver might be a better way to go about solving this, as Google does not allow bots to scrape search results.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!