beautifulsoup | 易学教程

Find a tag using text it contains using BeautifulSoup

阅读更多关于 Find a tag using text it contains using BeautifulSoup

问题 I am trying to webscrape some parts of this page: https://markets.businessinsider.com/stocks/bp-stock using BeautifulSoup to search for some text contained in h2 title of tables when i do: data_table = soup.find('h2', text=re.compile('RELATED STOCKS')).find_parent('div').find('table') It correctly get the table I am after. When I try to get the table "Analyst Opinion" using the similar line, it returns None: data_table = soup.find('h2', text=re.compile('ANALYST OPINIONS')).find_parent('div')

Find a tag using text it contains using BeautifulSoup

阅读更多关于 Find a tag using text it contains using BeautifulSoup

BeautifulSoup isn't working while web scraping Amazon

阅读更多关于 BeautifulSoup isn't working while web scraping Amazon

问题 I'm new to web scraping and i am trying to use basic skills on Amazon. I want to make a code for finding top 10 'Today's Greatest Deals' with prices and rating and other information. Every time I try to find a specific tag using find() and specifying class it keeps saying 'None'. However the actual HTML has that tag. On manual scanning i found out half the code of isn't being displayed in the output terminal. The code displayed is half but then the body and html tag do close. Just a huge

BeautifulSoup Scraping td & tr

阅读更多关于 BeautifulSoup Scraping td & tr

问题 I am trying to extract the price data (high and low) from the 3rd table (corn). The code is return "None": import urllib2 from bs4 import BeautifulSoup import time import re start_urls = 4539 nb_quotes = 10 for urls in range (start_urls, start_urls - nb_quotes, -1): start_time = time.time() # construct the URLs strings url = 'http://markets.iowafarmbureau.com/markets/fixed.php?page=egrains' # Read the HTML page content page = urllib2.urlopen(url) # Create a beautifulsoup object soup =

BeautifulSoup Scraping td & tr

阅读更多关于 BeautifulSoup Scraping td & tr

ImportError: No module named bs4 on mac

阅读更多关于 ImportError: No module named bs4 on mac

问题 I sat down tonight and have decided to leran how to use python. Inspired by this webpage scraping article. cam.ly/danesblog/2011/01/craigslist-arbitrage/ after working through a tutorial I: 1) downloaded and installed python: http://www.python.org/getit/ first 3.3 then 2.7 2) downloaded bs4: www.crummy.com/software/BeautifulSoup/bs4/download/ 3) followed Brian Clapper's instructions: How can I install the Beautiful Soup module on the Mac? tried both easy_install and python setup.py install

ImportError: No module named bs4 on mac

阅读更多关于 ImportError: No module named bs4 on mac

regex for loop over list in python

阅读更多关于 regex for loop over list in python

问题 I have this list [<th align="left"> <a href="blablabla">F</a>ojweousa</th>, <th align="left"> <a href="blablabla">S</a>awdefrgt</th>, ...] and want the one single character after "> the multiple characters between </a> and </th>, to be concatenated so that i can move on with my life. Here is my code item2 = [] for element in items2: first_letter = re.search('">.</a', str(items2)) second_letter = re.search(r'</a>[a-zA-Z0-9]</th>,', str(items2)) item2.append([str(first_letter) + str(second

Beautifulsoup Python unable to scrape data from a website

阅读更多关于 Beautifulsoup Python unable to scrape data from a website

问题 I have been using Python Beautifulsoup to scrape data. So far have beeen successfully scraped. But stuck with the following website. Target Site: LyricsHindiSong My goal is scrape song lyrics from the mentioned website. But all the time it gives blank result or Nonetype object has no attribute kind error. Have been struggling since last 15 days and could not able to figure out where was the problem and how to fix it? Following is the code which is I am using. import pymysql import requests

BeautifulSoup, Requests, Dataframe Saving to Excel arrays error

阅读更多关于 BeautifulSoup, Requests, Dataframe Saving to Excel arrays error

问题 I am a novice at Python and helping out on a school project. Any help is much appreciated. THANKS. I get an error when it gets to the year 2004 and 2003. And it is caused by the result_list list. The error is "ValueError: arrays must all be same length". How can I introduce code that fixes this. The scores are important.... import requests import pandas as pd from pandas import ExcelWriter from bs4 import BeautifulSoup #from openpyxl.writer.excel import ExcelWriter import openpyxl #from