soup.select('.r a') in f'https://google.com/search?q={query}' brings back empty list in Python BeautifulSoup. **NOT A DUPLICATE**

前端 未结 3 922
失恋的感觉
失恋的感觉 2020-11-30 14:42

The \"I\'m Feeling Lucky!\" project in the \"Automate the boring stuff with Python\" ebook no longer works with the code he provided.

Specifically, the linkElems = s

3条回答
  •  心在旅途
    2020-11-30 15:12

    Different websites (for instance Google) generate different HTML codes to different User-Agents (this is how the web browser is identified by the website). Another solution to your problem is to use a browser User-Agent to ensure that the HTML code you obtain from the website is the same you would get by using "view page source" in your browser. The following code just prints the list of google search result urls, not the same as the book you've referenced but it's still useful to show the point.

    #! python3
    # lucky.py - Opens several Google search results.
    
    import requests, sys, webbrowser, bs4
    print('Please enter your search term:')
    searchTerm = input()
    print('Googling...')    # display thext while downloading the Google page
    
    url = 'http://google.com/search?q=' + ' '.join(searchTerm)
    headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36'}
    
    res = requests.get(url, headers=headers)
    res.raise_for_status()
    
    
    # Retrieve top search results links.
    soup = bs4.BeautifulSoup(res.content)
    
    # Open a browser tab for each result.
    linkElems = soup.select('.r > a')   # Used '.r > a' instead of '.r a' because
    numOpen = min(5, len(linkElems))    # there are many href after div class="r"
    for i in range(numOpen):
      # webbrowser.open('http://google.com' + linkElems[i].get('href'))
      print(linkElems[i].get('href'))
    

提交回复
热议问题