beautifulsoup | 易学教程

Beautiful Soup returning nothing

阅读更多关于 Beautiful Soup returning nothing

问题 Hi I am working on a project for my school that involves scraping off the HTML. However I get none returned when I look for tables. Here is the segment that experiences the issue. If you need more info I'd be happy to give it to you from bs4 import BeautifulSoup import urllib2 import datetime #This section determines the date of the next Saturday which will go onto the end of the URL d = datetime.date.today() while d.weekday() != 5: d += datetime.timedelta(1) #temporary logic for testing when

Beautiful Soup returning nothing

阅读更多关于 Beautiful Soup returning nothing

Cleaning text string after getting body text using Beautifulsoup

阅读更多关于 Cleaning text string after getting body text using Beautifulsoup

问题 I'm trying to get text from articles on various webpages and write them as clean text documents. I don't want all visible text because that often includes irrelevant links on the side of webpages. I'm using Beautifulsoup to extract the information from pages. But, extra links not just on the side of the page but also those sometimes in the middle of the body text and at the bottom of the articles sometimes make it into the final product. Does anyone know how to deal with the problem of extra

What's needed to get BeautifulSoup4+lxml to work with cx_freeze?

阅读更多关于 What's needed to get BeautifulSoup4+lxml to work with cx_freeze?

问题 Summary: I have a wxPython/bs4 app that I'm building into an exe with cx_freeze. There build succeeds with no errors, but trying to run the EXE results a FeatureNotFound error from BeautifulSoup4. It's complaining that I don't have my lxml library installed. I've since stripped the program down to it's minimal state and still get the error. Has anyone else had success building a bs4 app with cx_freeze? Please take a look at the details below and let me know of any ideas you may have. Thanks,

What's needed to get BeautifulSoup4+lxml to work with cx_freeze?

阅读更多关于 What's needed to get BeautifulSoup4+lxml to work with cx_freeze?

Generating URL for Yahoo and Bing Scrapping for multiple pages with Python and BeautifulSoup

阅读更多关于 Generating URL for Yahoo and Bing Scrapping for multiple pages with Python and BeautifulSoup

问题 I want to scrape news from different sources. I found a way to generate URL for scrapping multiple pages from google, but I think that there is a way to generate much shorter link. Can you please tell me how to generate the URL for scrapping multiple pages for Bing and Yahoo news, and also, is there a way to make google url shorter. This is the code for google: from bs4 import BeautifulSoup import requests headers = {'User-Agent':'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML,

How Can I Export Scraped Data to Excel Horizontally?

阅读更多关于 How Can I Export Scraped Data to Excel Horizontally?

问题 I'm relatively new to Python. Using this site as an example, I'm trying to scrape the restaurants' information but I'm not sure how to pivot this data horizontally when it's being read vertically. I'd like the Excel sheet to have six columns as follows: Name, Street, City, State, Zip, Phone. This is the code I'm using: from selenium import webdriver from bs4 import BeautifulSoup from urllib.request import urlopen import time driver = webdriver.Chrome(executable_path=r"C:\Downloads

BeautifulSoup XML to CSV

阅读更多关于 BeautifulSoup XML to CSV

问题 The code below takes an xml files and parses it into csv file. import openpyxl from bs4 import BeautifulSoup with open('1last.xml') as f_input: soup = BeautifulSoup(f_input, 'lxml') wb = openpyxl.Workbook() ws = wb.active ws.title = "Sheet1" ws.append(["Description", "num", "text"]) for description in soup.find_all("description"): ws.append(["", description['num'], description.text]) ws.append(["SetData", "x", "value", "xin", "xax"]) for setdata in soup.find_all("setdata"): ws.append(["",

BeautifulSoup XML to CSV

阅读更多关于 BeautifulSoup XML to CSV

scrape data from into dataframe with BeautifulSoup

阅读更多关于 scrape data from into dataframe with BeautifulSoup

问题 I'm working on a project to scrape and parse data from California lottery into a dataframe Here's my code so far, it produces no error but also no output: import requests from bs4 import BeautifulSoup as bs4 draw = 'http://www.calottery.com/play/draw-games/superlotto-plus/winning-numbers/?page=1' page = requests.get(draw) soup = bs4(page.text) drawing_list = [] for table_row in soup.select("table.tag_even_numbers tr"): cells = table_row.findAll('td') if len(cells) > 0: draw_date = cells[0]