bs4

Compiled with CX_FREEZE, Beautiful Soup program wont run in Console

大憨熊 提交于 2019-12-10 23:15:13
问题 This is the error I am getting when I run the EXE file of the program. The program runs fine in Pycharm but generates such error in console. bs4.FeatureNotFound: Couldn't find a Tree Builder with features you requested. Do you need to install a parser library? import sys from cx_Freeze import setup, Executable build_exe_options = {"packages": ["bs4, urllib, requests"], "excludes": [""]} base = None setup( name = "Weather", version = "0.9.0", options = {"program": build_exe_options},

regex not working in bs4

余生颓废 提交于 2019-12-10 22:02:19
问题 I am trying to extract some links from a specific filehoster on watchseriesfree.to website. In the following case I want rapidvideo links, so I use regex to filter out those tags with text containing rapidvideo import re import urllib2 from bs4 import BeautifulSoup def gethtml(link): req = urllib2.Request(link, headers={'User-Agent': "Magic Browser"}) con = urllib2.urlopen(req) html = con.read() return html def findLatest(): url = "https://watchseriesfree.to/serie/Madam-Secretary" head =

BeautifulSoup (bs4) parsing wrong

百般思念 提交于 2019-12-10 20:58:35
问题 Parsing this sample document with bs4, from python 2.7.6: <html> <body> <p>HTML allows omitting P end-tags. <p>Like that and this. <p>And this, too. <p>What happened?</p> <p>And can we <p>nest a paragraph, too?</p></p> </body> </html> Using: from bs4 import BeautifulSoup as BS ... tree = BS(fh) HTML has, for ages, allowed omitted end-tags for various element types, including P (check the schema, or a parser). However, bs4's prettify() on this document shows that it doesn't end any of those

Python - AttributeError: 'NoneType' object has no attribute 'get_text'

左心房为你撑大大i 提交于 2019-12-10 11:55:14
问题 I am following some tutorial for bs4. I am trying to get_text() for below example with 'a'. Tutorial return result McDermott International and MDR without problem. But when I do I got AttributeError: 'NoneType' object has no attribute 'get_text'. Please help. Many thanks! with open('Energy.htm') as f: soup = BeautifulSoup(f,"lxml") energylist = soup.find_all('td', {"style" : "text-align:left;"}) for stock in energylist: try: stock_name = stock.find('a').get_text() except: stock_name = ''

How Do I Remove An XML Declaration Using BeautifulSoup4

杀马特。学长 韩版系。学妹 提交于 2019-12-10 11:30:21
问题 I have an XHTML file that is structured like this: <?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE html> <html lang="en"> <head> ... </head> <body> ... </body> <html> I'm using BeautifulSoup and I want to remove the XML declaration from the document, so what I have looks like this: <!DOCTYPE html> <html lang="en"> <head> ... </head> <body> ... </body> <html> I can't find a way to get at the XML declaration to remove it. It doesn't appear to be a Doctype, Declaration, Tag, or NavigableString

Scraping a list of urls

邮差的信 提交于 2019-12-08 09:23:17
问题 I am using Python 3.5 and trying to scrape a list of urls (from the same website), code as follows: import urllib.request from bs4 import BeautifulSoup url_list = ['URL1', 'URL2','URL3] def soup(): for url in url_list: sauce = urllib.request.urlopen(url) for things in sauce: soup_maker = BeautifulSoup(things, 'html.parser') return soup_maker # Scraping def getPropNames(): for propName in soup.findAll('div', class_="property-cta"): for h1 in propName.findAll('h1'): print(h1.text) def getPrice(

using bs4 to find a html tag (h2) having text

家住魔仙堡 提交于 2019-12-08 08:10:20
问题 for this part of html code: html3= """<a name="definition"> </a> <h2><span class="sectioncount">3.342.2323</span> Content Logical Definition <a title="link to here" class="self-link" href="valueset-investigation"><img src="ta.png"/></a></h2> <hr/> <div><p from the following </p><ul><li>Include these codes as defined in http://snomed.info/sct<table><tr><td><b>Code</b></td><td><b>Display</b></td></tr><tr><td>34353553</td><td>Examination / signs</td><td/></tr><tr><td>35453453453</td><td>History

Extracting text between tags using BeautifulSoup

大城市里の小女人 提交于 2019-12-08 07:11:02
问题 I am trying to extract text from a series of webpages that all follow a similar format using BeautifulSoup. The html for the text I wish to extract is below. The actual link is here: http://www.p2016.org/ads1/bushad120215.html. <p><span style="color: rgb(153, 153, 153);"></span><font size="-1"> <span style="font-family: Arial;"><big><span style="color: rgb(153, 153, 153);"></span></big></span></font><span style="color: rgb(153, 153, 153);"></span><font size="-1"><span style="font-family:

Accessing untagged text using beautifulsoup

|▌冷眼眸甩不掉的悲伤 提交于 2019-12-08 03:56:37
问题 I am using python and beautifulsoup4 to extract some address information. More specifically, I require assistance when retrieving non-US based zip codes. Consider the following html data of a US based company: (already a soup object) <div class="compContent curvedBottom" id="companyDescription"> <div class="vcard clearfix"> <p id="adr"> <span class="street-address">999 State St Ste 100</span><br/> <span class="locality">Salt Lake City,</span> <span class="region">UT</span> <span class="zip"

Accessing untagged text using beautifulsoup

戏子无情 提交于 2019-12-08 02:51:26
I am using python and beautifulsoup4 to extract some address information. More specifically, I require assistance when retrieving non-US based zip codes. Consider the following html data of a US based company: (already a soup object) <div class="compContent curvedBottom" id="companyDescription"> <div class="vcard clearfix"> <p id="adr"> <span class="street-address">999 State St Ste 100</span><br/> <span class="locality">Salt Lake City,</span> <span class="region">UT</span> <span class="zip">84114-0002,</span> <br/><span class="country-name">United States</span> </p> <p> <span class="tel">