beautifulsoup

Find a tag using text it contains using BeautifulSoup

荒凉一梦 提交于 2021-02-20 05:16:37
问题 I am trying to webscrape some parts of this page: https://markets.businessinsider.com/stocks/bp-stock using BeautifulSoup to search for some text contained in h2 title of tables when i do: data_table = soup.find('h2', text=re.compile('RELATED STOCKS')).find_parent('div').find('table') It correctly get the table I am after. When I try to get the table "Analyst Opinion" using the similar line, it returns None: data_table = soup.find('h2', text=re.compile('ANALYST OPINIONS')).find_parent('div')

Find a tag using text it contains using BeautifulSoup

删除回忆录丶 提交于 2021-02-20 05:16:06
问题 I am trying to webscrape some parts of this page: https://markets.businessinsider.com/stocks/bp-stock using BeautifulSoup to search for some text contained in h2 title of tables when i do: data_table = soup.find('h2', text=re.compile('RELATED STOCKS')).find_parent('div').find('table') It correctly get the table I am after. When I try to get the table "Analyst Opinion" using the similar line, it returns None: data_table = soup.find('h2', text=re.compile('ANALYST OPINIONS')).find_parent('div')

BeautifulSoup isn't working while web scraping Amazon

老子叫甜甜 提交于 2021-02-20 04:13:09
问题 I'm new to web scraping and i am trying to use basic skills on Amazon. I want to make a code for finding top 10 'Today's Greatest Deals' with prices and rating and other information. Every time I try to find a specific tag using find() and specifying class it keeps saying 'None'. However the actual HTML has that tag. On manual scanning i found out half the code of isn't being displayed in the output terminal. The code displayed is half but then the body and html tag do close. Just a huge

BeautifulSoup Scraping td & tr

扶醉桌前 提交于 2021-02-19 09:26:08
问题 I am trying to extract the price data (high and low) from the 3rd table (corn). The code is return "None": import urllib2 from bs4 import BeautifulSoup import time import re start_urls = 4539 nb_quotes = 10 for urls in range (start_urls, start_urls - nb_quotes, -1): start_time = time.time() # construct the URLs strings url = 'http://markets.iowafarmbureau.com/markets/fixed.php?page=egrains' # Read the HTML page content page = urllib2.urlopen(url) # Create a beautifulsoup object soup =

BeautifulSoup Scraping td & tr

*爱你&永不变心* 提交于 2021-02-19 09:25:30
问题 I am trying to extract the price data (high and low) from the 3rd table (corn). The code is return "None": import urllib2 from bs4 import BeautifulSoup import time import re start_urls = 4539 nb_quotes = 10 for urls in range (start_urls, start_urls - nb_quotes, -1): start_time = time.time() # construct the URLs strings url = 'http://markets.iowafarmbureau.com/markets/fixed.php?page=egrains' # Read the HTML page content page = urllib2.urlopen(url) # Create a beautifulsoup object soup =

ImportError: No module named bs4 on mac

半世苍凉 提交于 2021-02-19 05:09:52
问题 I sat down tonight and have decided to leran how to use python. Inspired by this webpage scraping article. cam.ly/danesblog/2011/01/craigslist-arbitrage/ after working through a tutorial I: 1) downloaded and installed python: http://www.python.org/getit/ first 3.3 then 2.7 2) downloaded bs4: www.crummy.com/software/BeautifulSoup/bs4/download/ 3) followed Brian Clapper's instructions: How can I install the Beautiful Soup module on the Mac? tried both easy_install and python setup.py install

ImportError: No module named bs4 on mac

左心房为你撑大大i 提交于 2021-02-19 05:09:20
问题 I sat down tonight and have decided to leran how to use python. Inspired by this webpage scraping article. cam.ly/danesblog/2011/01/craigslist-arbitrage/ after working through a tutorial I: 1) downloaded and installed python: http://www.python.org/getit/ first 3.3 then 2.7 2) downloaded bs4: www.crummy.com/software/BeautifulSoup/bs4/download/ 3) followed Brian Clapper's instructions: How can I install the Beautiful Soup module on the Mac? tried both easy_install and python setup.py install

regex for loop over list in python

不问归期 提交于 2021-02-19 02:52:06
问题 I have this list [<th align="left"> <a href="blablabla">F</a>ojweousa</th>, <th align="left"> <a href="blablabla">S</a>awdefrgt</th>, ...] and want the one single character after "> the multiple characters between </a> and </th>, to be concatenated so that i can move on with my life. Here is my code item2 = [] for element in items2: first_letter = re.search('">.</a', str(items2)) second_letter = re.search(r'</a>[a-zA-Z0-9]</th>,', str(items2)) item2.append([str(first_letter) + str(second

Beautifulsoup Python unable to scrape data from a website

别说谁变了你拦得住时间么 提交于 2021-02-19 01:49:25
问题 I have been using Python Beautifulsoup to scrape data. So far have beeen successfully scraped. But stuck with the following website. Target Site: LyricsHindiSong My goal is scrape song lyrics from the mentioned website. But all the time it gives blank result or Nonetype object has no attribute kind error. Have been struggling since last 15 days and could not able to figure out where was the problem and how to fix it? Following is the code which is I am using. import pymysql import requests

BeautifulSoup, Requests, Dataframe Saving to Excel arrays error

被刻印的时光 ゝ 提交于 2021-02-18 18:59:53
问题 I am a novice at Python and helping out on a school project. Any help is much appreciated. THANKS. I get an error when it gets to the year 2004 and 2003. And it is caused by the result_list list. The error is "ValueError: arrays must all be same length". How can I introduce code that fixes this. The scores are important.... import requests import pandas as pd from pandas import ExcelWriter from bs4 import BeautifulSoup #from openpyxl.writer.excel import ExcelWriter import openpyxl #from