Google search web scraping with a list of key words in python

时光毁灭记忆、已成空白 提交于 2020-04-07 10:36:20

问题


I'm trying to do web scraping on Google search by using a list of names as inputs and get dataset in a DataFame. I used selenium for web scraping before, I am having a difficult time building syntax using loops to run a list of names as an input to get the results and scrap each page. Here is my Python code below:

baseUrl = 'https://www.google.com/search?q='
pluseUrl = input('CEO: ')    
url = baseUrl + quote_plus(pluseUrl)

browser = webdriver.Chrome(r"C:\Users\...\chromedriver.exe")
browser.get(url)

table = browser.find_elements_by_css_selector('div.ifM9O') 

df = pd.DataFrame(columns = ['ceo', 'value'])
values =[]


for row in table:
    ceo = str(([c.text for c in row.find_elements_by_css_selector('div.kno-ecr-pt.PZPZlf.gsmt.i8lZMc')])).strip('[]').strip("''")
    value = str(([c.text for c in row.find_elements_by_css_selector('div.Z1hOCe')])).strip('[]').strip("''")

ceo = pd.Series(ceo)
value = pd.Series(value)

df = df.assign(**{'ceo': ceo, 'value': value}) 


print(df)

And here is the result after putting Bill Gates as an input:

CEO: Bill gates
          ceo                                              value
0  Bill Gates  Born: October 28, 1955 (age 64 years), Seattle...

Any suggestions or recommendations will be appreciated.


回答1:


Try this:

baseUrl = 'https://www.google.com/search?q='
browser = webdriver.Chrome(r"C:\Users\...\chromedriver.exe")
input_list = ["Bill Gates", "Elon Musk", "Warren Buffet"]
output = {}

def scrape_ceo_list(list_of_ceo):
     for ceo in list_of_ceo:
          browser.get(baseUrl + ceo)

          // query selectors, dataframes etc as per original code
          // ...

          output[ceo] = df

output is now a dictionary of data frames, with CEO names as dictionary keys.



来源:https://stackoverflow.com/questions/60628327/google-search-web-scraping-with-a-list-of-key-words-in-python

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!