How Can I Export Scraped Data to Excel Horizontally?

怎甘沉沦 提交于 2020-01-05 07:21:19

问题


I'm relatively new to Python. Using this site as an example, I'm trying to scrape the restaurants' information but I'm not sure how to pivot this data horizontally when it's being read vertically. I'd like the Excel sheet to have six columns as follows: Name, Street, City, State, Zip, Phone. This is the code I'm using:

from selenium import webdriver
from bs4 import BeautifulSoup
from urllib.request import urlopen
import time

driver = webdriver.Chrome(executable_path=r"C:\Downloads\chromedriver_win32\chromedriver.exe")


driver.get('https://www.restaurant.com/listing?&&st=KS&p=KS&p=PA&page=1&&searchradius=50&loc=10021')
time.sleep(10)
with urlopen(driver.current_url) as response:
    soup = BeautifulSoup(response, 'html.parser')
    pageList = soup.findAll("div", attrs={"class": {"details"}})
    list_of_inner_text = [x.text for x in pageList]
    text = ', '.join(list_of_inner_text)
    print(text)

Thanks

EDIT: Based on feedback, here's what I would expect from the first five restaurants on this page: FirstFiveRestaurants


回答1:


Here is one way. You mileage may vary on other pages.

This line

details = [re.sub(r'\s{2,}|[,]', '',i) for i in restuarant.select_one('h3 + p').text.strip().split('\n') if i!=''

basically handles the generation of the output columns (bar name) by splitting the p tag on '\n' and doing a little string cleaning.

import requests, re
from bs4 import BeautifulSoup 
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
import pandas as pd

driver = webdriver.Chrome(executable_path=r"C:\Users\User\Documents\chromedriver.exe")
driver.get('https://www.restaurant.com/listing?&&st=KS&p=KS&p=PA&page=1&&searchradius=50&loc=10021')
WebDriverWait(driver,10).until(EC.presence_of_all_elements_located((By.CSS_SELECTOR, ".restaurants")))
soup = BeautifulSoup(driver.page_source, 'html.parser')
restuarants = soup.select('.restaurants')
results = []

for restuarant in restuarants:
    details = [re.sub(r'\s{2,}|[,]', '',i) for i in restuarant.select_one('h3 + p').text.strip().split('\n') if i!='']
    details.insert(0, restuarant.select_one('h3 a').text)
    results.append(details)

df = pd.DataFrame(results, columns= ['Name','Address', 'City', 'State', 'Zip', 'Phone'])
df.to_csv(r'C:\Users\User\Desktop\Data.csv', sep=',', encoding='utf-8-sig',index = False )



来源:https://stackoverflow.com/questions/57677808/how-can-i-export-scraped-data-to-excel-horizontally

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!