soup.select('.r a') in f'https://google.com/search?q={query}' brings back empty list in Python BeautifulSoup. **NOT A DUPLICATE**

半城伤御伤魂 提交于 2019-11-26 17:52:44

问题


The "I'm Feeling Lucky!" project in the "Automate the boring stuff with Python" ebook no longer works with the code he provided.

Specifically, the linkElems = soup.select('.r a')

I've already tried using the solution provided in: soup.select('.r a') in 'https://www.google.com/#q=vigilante+mic' gives empty list in python BeautifulSoup

, and I'm currently using the same search format.

import webbrowser, requests, bs4

def im_feeling_lucky():

    # Make search query look like Google's
    search = '+'.join(input('Search Google: ').split(" "))

    # Pull html from Google
    print('Googling...') # display text while downloading the Google page
    res = requests.get(f'https://google.com/search?q={search}&oq={search}')
    res.raise_for_status()

    # Retrieve top search result link
    soup = bs4.BeautifulSoup(res.text, features='lxml')


    # Open a browser tab for each result.
    linkElems = soup.select('.r')  # Returns empty list
    numOpen = min(5, len(linkElems))
    print('Before for loop')
    for i in range(numOpen):
        webbrowser.open(f'http://google.com{linkElems[i].get("href")}')

The linkElems variable returns an empty list [] and the program doesn't do anything past that.


回答1:


I too had had the same problem while reading that book and found a solution for that problem.

replacing

soup.select('.r a')

with

soup.select('div#main > div > div > div > a')

will solve that issue

following is the code that will work

import webbrowser, requests, bs4 , sys

print('Googling...')
res = requests.get('https://google.com/search?q=' + ' '.join(sys.argv[1:]))
res.raise_for_status()

soup = bs4.BeautifulSoup(res.text)

linkElems = soup.select('div#main > div > div > div > a')  
numOpen = min(5, len(linkElems))
for i in range(numOpen):
    webbrowser.open('http://google.com' + linkElems[i].get("href"))

the above code takes input from commandline arguments




回答2:


I took a different route. I saved the HTML from the request and opened that page, then I inspected the elements. It turns out that the page is different if I open it natively in the Chrome browser compared to what my python request is served. I identified the div with the class that appears to denote a result and supplemented that for the .r - in my case it was .kCrYT

#! python3

# lucky.py - Opens several Google Search results.

import requests, sys, webbrowser, bs4

print('Googling...') # display text while the google page is downloading

url= 'http://www.google.com.au/search?q=' + ' '.join(sys.argv[1:])
url = url.replace(' ','+')


res = requests.get(url)
res.raise_for_status()


# Retrieve top search result links.
soup=bs4.BeautifulSoup(res.text, 'html.parser')


# get all of the 'a' tags afer an element with the class 'kCrYT' (which are the results)
linkElems = soup.select('.kCrYT > a') 

# Open a browser tab for each result.
numOpen = min(5, len(linkElems))
for i in range(numOpen):
    webbrowser.open_new_tab('http://google.com.au' + linkElems[i].get('href'))



回答3:


Different websites (for instance Google) generate different HTML codes to different User-Agents (this is how the web browser is identified by the website). Another solution to your problem is to use a browser User-Agent to ensure that the HTML code you obtain from the website is the same you would get by using "view page source" in your browser. The following code just prints the list of google search result urls, not the same as the book you've referenced but it's still useful to show the point.

#! python3
# lucky.py - Opens several Google search results.

import requests, sys, webbrowser, bs4
print('Please enter your search term:')
searchTerm = input()
print('Googling...')    # display thext while downloading the Google page

url = 'http://google.com/search?q=' + ' '.join(searchTerm)
headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36'}

res = requests.get(url, headers=headers)
res.raise_for_status()


# Retrieve top search results links.
soup = bs4.BeautifulSoup(res.content)

# Open a browser tab for each result.
linkElems = soup.select('.r > a')   # Used '.r > a' instead of '.r a' because
numOpen = min(5, len(linkElems))    # there are many href after div class="r"
for i in range(numOpen):
  # webbrowser.open('http://google.com' + linkElems[i].get('href'))
  print(linkElems[i].get('href'))


来源:https://stackoverflow.com/questions/56664934/soup-select-r-a-in-fhttps-google-com-searchq-query-brings-back-empty

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!