beautifulsoup

Difference between “findAll” and “find_all” in BeautifulSoup

两盒软妹~` 提交于 2019-12-27 12:07:53
问题 I would like to parse an HTML file with Python, and the module I am using is BeautifulSoup. It is said that the function find_all is the same as findAll . I've tried both of them, but I believe they are different: import urllib, urllib2, cookielib from BeautifulSoup import * site = "http://share.dmhy.org/topics/list?keyword=TARI+TARI+team_id%3A407" rqstr = urllib2.Request(site) rq = urllib2.urlopen(rqstr) fchData = rq.read() soup = BeautifulSoup(fchData) t = soup.findAll('tr') Can anyone tell

Difference between “findAll” and “find_all” in BeautifulSoup

北城余情 提交于 2019-12-27 12:06:58
问题 I would like to parse an HTML file with Python, and the module I am using is BeautifulSoup. It is said that the function find_all is the same as findAll . I've tried both of them, but I believe they are different: import urllib, urllib2, cookielib from BeautifulSoup import * site = "http://share.dmhy.org/topics/list?keyword=TARI+TARI+team_id%3A407" rqstr = urllib2.Request(site) rq = urllib2.urlopen(rqstr) fchData = rq.read() soup = BeautifulSoup(fchData) t = soup.findAll('tr') Can anyone tell

Following links in python assignment using Beautifulsoup

非 Y 不嫁゛ 提交于 2019-12-26 13:33:08
问题 I have this assignment for a python class where I have to start from a specific link at a specific position, then follow that link for a specific number of times. Supposedly the first link has the position 1. This is the link: http://python-data.dr-chuck.net/known_by_Fikret.html traceback error picture I have trouble with locating the link, the error "index out of range" comes out. can anyone help with figuring out how to locate the link/position? This is my code: import urllib from

how to exclude all title with find?

穿精又带淫゛_ 提交于 2019-12-25 19:02:34
问题 i have function that get me all the titles from my website i dont want to get the title from some products is this the right way ? i dont want titles from products with the words "OLP NL" or "Arcserve" or "LicSAPk" or "symantec" def get_title ( u ): html = requests.get ( u ) bsObj = BeautifulSoup ( html.content, 'xml' ) title = str ( bsObj.title ).replace ( '<title>', '' ).replace ( '</title>', '' ) if (title.find ( 'Arcserve' ) or title.find ( 'OLP NL' ) or title.find ( 'LicSAPk' ) or title

how to exclude all title with find?

ぃ、小莉子 提交于 2019-12-25 19:01:33
问题 i have function that get me all the titles from my website i dont want to get the title from some products is this the right way ? i dont want titles from products with the words "OLP NL" or "Arcserve" or "LicSAPk" or "symantec" def get_title ( u ): html = requests.get ( u ) bsObj = BeautifulSoup ( html.content, 'xml' ) title = str ( bsObj.title ).replace ( '<title>', '' ).replace ( '</title>', '' ) if (title.find ( 'Arcserve' ) or title.find ( 'OLP NL' ) or title.find ( 'LicSAPk' ) or title

how to save data in the db django model?

痴心易碎 提交于 2019-12-25 18:47:30
问题 Good day, I can't really understand what I'm doing wrong in here. I was using this function base view to store my scrap data in the database with the django model, but now it's not saving any more. I can't really understand why. Any idea? def weather_fetch(request): context = None corrected_rainChance = None url = 'http://weather.news24.com/sa/cape-town' extracted_city = url.split('/')[-1] city = extracted_city.replace('-', " ") print(city) url_request = urlopen(url) soup = BeautifulSoup(url

isinstance not working correctly with beautifulsoup(NameError)

杀马特。学长 韩版系。学妹 提交于 2019-12-25 16:58:14
问题 I'm using isinstance to select some html tags and passing them to a Beautifulsoup function. The problem is I keep getting NameErrors from what should be perfectly executable code. def horse_search(tag): return (tag.has_attr('href') and isinstance(tag.previous_element, span)) ... for tag in soup.find_all(horse_search): print (tag) NameError: global name 'span' is not defined Also I'm getting errors from the example code in the documentation of Beautifulsoup using isinstance in conjunction with

Python 2.7 BeautifulSoup email scraping stops before end of full database

喜夏-厌秋 提交于 2019-12-25 16:47:15
问题 Hope you are all well! I'm new and using Python 2.7! I'm tring to extract emails from a public available directory website that does not seems to have API: this is the site: http://www.tecomdirectory.com/companies.php?segment=&activity=&search=category&submit=Search , the code stop gathering email where on the page at the bottom where it says "load more"! Here is my code: import requests import re from bs4 import BeautifulSoup file_handler = open('mail.txt','w') soup = BeautifulSoup(requests

'charmap' codec can't encode character '\xae' While Scraping a Webpage

本秂侑毒 提交于 2019-12-25 15:15:23
问题 I am web-scraping with Python using BeautifulSoap I am getting this error 'charmap' codec can't encode character '\xae' in position 69: character maps to <undefined> when scraping a webpage This is my Python hotel = BeautifulSoup(state.) print (hotel.select("div.details.cf span.hotel-name a")) # Tried: print (hotel.select("div.details.cf span.hotel-name a")).encode('utf-8') 回答1: We usually encounter this problem here when we are trying to .encode() an already encoded byte string. So you might

List not allowing .splitlines() - Python

时光总嘲笑我的痴心妄想 提交于 2019-12-25 14:45:38
问题 What do I need to do to prevent the error: AttributeError: 'list' object has no attribute 'split lines' from occurring here? How to I convert the list that I have into a form that can have splitlines attributed to? import requests import re from bs4 import BeautifulSoup import csv #Read csv with open ("gyms4.csv") as file: reader = csv.reader(file) csvfilelist = [row[0] for row in reader] print csvfilelist #Get data from each url def get_page_data(): for page_data in csvfilelist.splitlines():