How to collect data of Google Search with beautiful soup using python

问题

I want to know about how I can collect all the URL's and from the page source using beautiful soup and can visit all of them one by one in the google search results and move to next google index pages.

here is the URL https://www.google.com/search?q=site%3Awww.rashmi.com&rct=j that I want to collect and screen shot here http://www.rashmi.com/blog/wp-content/uploads/2014/11/screencapture-www-google-com-search-1433026719960.png

here is the code I'm trying

def getPageLinks(page):
links = []
for link in page.find_all('a'):
    url = link.get('href')
    if url:
        if 'www.rashmi.com/' in url:
            links.append(url)
return links

def Links(url):
pUrl = urlparse(url)
return parse_qs(pUrl.query)[0]

def PagesVisit(browser, printInfo):
pageIndex = 1
visited = []
time.sleep(5)
while True:  
    browser.get("https://www.google.com/search?q=site:www.rashmi.com&ei=50hqVdCqJozEogS7uoKADg" + str(pageIndex)+"&start=10&sa=N")
    pList = []
    count = 0

    pageIndex += 1

回答1:

Try this it should work.

def getPageLinks(page):
links = []
for link in page.find_all('a'):
url = link.get('href')
if url:
    if 'www.rashmi.com/' in url:
        links.append(url)
return links

def Links(url):
pUrl = urlparse(url)
return parse_qs(pUrl.query)

def PagesVisit(browser, printInfo):
    start = 0
    visited = []
    time.sleep(5)
    while True:  
            browser.get("https://www.google.com/search?q=site:www.rashmi.com&ei=V896VdiLEcPmUsK7gdAH&" + str(start) + "&sa=N")


    pList = []
    count = 0
    # Random sleep to make sure everything loads
    time.sleep(random.randint(1, 5))
    page = BeautifulSoup(browser.page_source)


    start +=10      
    if start ==500:
    browser.close()

来源：https://stackoverflow.com/questions/30552470/how-to-collect-data-of-google-search-with-beautiful-soup-using-python

标签

python

selenium

beautifulsoup