beautifulsoup

Beautiful Soup returning nothing

China☆狼群 提交于 2020-01-05 10:11:21
问题 Hi I am working on a project for my school that involves scraping off the HTML. However I get none returned when I look for tables. Here is the segment that experiences the issue. If you need more info I'd be happy to give it to you from bs4 import BeautifulSoup import urllib2 import datetime #This section determines the date of the next Saturday which will go onto the end of the URL d = datetime.date.today() while d.weekday() != 5: d += datetime.timedelta(1) #temporary logic for testing when

Beautiful Soup returning nothing

女生的网名这么多〃 提交于 2020-01-05 10:11:13
问题 Hi I am working on a project for my school that involves scraping off the HTML. However I get none returned when I look for tables. Here is the segment that experiences the issue. If you need more info I'd be happy to give it to you from bs4 import BeautifulSoup import urllib2 import datetime #This section determines the date of the next Saturday which will go onto the end of the URL d = datetime.date.today() while d.weekday() != 5: d += datetime.timedelta(1) #temporary logic for testing when

Cleaning text string after getting body text using Beautifulsoup

白昼怎懂夜的黑 提交于 2020-01-05 09:34:58
问题 I'm trying to get text from articles on various webpages and write them as clean text documents. I don't want all visible text because that often includes irrelevant links on the side of webpages. I'm using Beautifulsoup to extract the information from pages. But, extra links not just on the side of the page but also those sometimes in the middle of the body text and at the bottom of the articles sometimes make it into the final product. Does anyone know how to deal with the problem of extra

What's needed to get BeautifulSoup4+lxml to work with cx_freeze?

怎甘沉沦 提交于 2020-01-05 08:51:50
问题 Summary: I have a wxPython/bs4 app that I'm building into an exe with cx_freeze. There build succeeds with no errors, but trying to run the EXE results a FeatureNotFound error from BeautifulSoup4. It's complaining that I don't have my lxml library installed. I've since stripped the program down to it's minimal state and still get the error. Has anyone else had success building a bs4 app with cx_freeze? Please take a look at the details below and let me know of any ideas you may have. Thanks,

What's needed to get BeautifulSoup4+lxml to work with cx_freeze?

强颜欢笑 提交于 2020-01-05 08:51:09
问题 Summary: I have a wxPython/bs4 app that I'm building into an exe with cx_freeze. There build succeeds with no errors, but trying to run the EXE results a FeatureNotFound error from BeautifulSoup4. It's complaining that I don't have my lxml library installed. I've since stripped the program down to it's minimal state and still get the error. Has anyone else had success building a bs4 app with cx_freeze? Please take a look at the details below and let me know of any ideas you may have. Thanks,

Generating URL for Yahoo and Bing Scrapping for multiple pages with Python and BeautifulSoup

家住魔仙堡 提交于 2020-01-05 08:27:29
问题 I want to scrape news from different sources. I found a way to generate URL for scrapping multiple pages from google, but I think that there is a way to generate much shorter link. Can you please tell me how to generate the URL for scrapping multiple pages for Bing and Yahoo news, and also, is there a way to make google url shorter. This is the code for google: from bs4 import BeautifulSoup import requests headers = {'User-Agent':'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML,

How Can I Export Scraped Data to Excel Horizontally?

怎甘沉沦 提交于 2020-01-05 07:21:19
问题 I'm relatively new to Python. Using this site as an example, I'm trying to scrape the restaurants' information but I'm not sure how to pivot this data horizontally when it's being read vertically. I'd like the Excel sheet to have six columns as follows: Name, Street, City, State, Zip, Phone. This is the code I'm using: from selenium import webdriver from bs4 import BeautifulSoup from urllib.request import urlopen import time driver = webdriver.Chrome(executable_path=r"C:\Downloads

BeautifulSoup XML to CSV

核能气质少年 提交于 2020-01-05 07:14:55
问题 The code below takes an xml files and parses it into csv file. import openpyxl from bs4 import BeautifulSoup with open('1last.xml') as f_input: soup = BeautifulSoup(f_input, 'lxml') wb = openpyxl.Workbook() ws = wb.active ws.title = "Sheet1" ws.append(["Description", "num", "text"]) for description in soup.find_all("description"): ws.append(["", description['num'], description.text]) ws.append(["SetData", "x", "value", "xin", "xax"]) for setdata in soup.find_all("setdata"): ws.append(["",

BeautifulSoup XML to CSV

核能气质少年 提交于 2020-01-05 07:13:15
问题 The code below takes an xml files and parses it into csv file. import openpyxl from bs4 import BeautifulSoup with open('1last.xml') as f_input: soup = BeautifulSoup(f_input, 'lxml') wb = openpyxl.Workbook() ws = wb.active ws.title = "Sheet1" ws.append(["Description", "num", "text"]) for description in soup.find_all("description"): ws.append(["", description['num'], description.text]) ws.append(["SetData", "x", "value", "xin", "xax"]) for setdata in soup.find_all("setdata"): ws.append(["",

scrape data from into dataframe with BeautifulSoup

醉酒当歌 提交于 2020-01-05 07:01:26
问题 I'm working on a project to scrape and parse data from California lottery into a dataframe Here's my code so far, it produces no error but also no output: import requests from bs4 import BeautifulSoup as bs4 draw = 'http://www.calottery.com/play/draw-games/superlotto-plus/winning-numbers/?page=1' page = requests.get(draw) soup = bs4(page.text) drawing_list = [] for table_row in soup.select("table.tag_even_numbers tr"): cells = table_row.findAll('td') if len(cells) > 0: draw_date = cells[0]