beautifulsoup

BeautifulSoup4 - Concatenating multiple html elements between two different tags

霸气de小男生 提交于 2020-02-25 05:26:32
问题 I am scraping a page using Python & bs4 The html source code that I get from bs4 is as follows (cleaned up a bit for readability purpose): <p style="text-align:justify;font-size:12.0px;font-family:Arial, Helvetica, sans-serif"> <span style="font-size:14.0px"><span style="font-family:Arial, Helvetica, sans-serif"> <strong>COMPANY DESCRIPTION</strong><br> Here goes the first para of company description</span></span></p> <p style="text-align:justify;font-size:12.0px;font-family:Arial, Helvetica,

Extract text with line break in BeautifulSoup

青春壹個敷衍的年華 提交于 2020-02-25 04:10:07
问题 I'd like to extract text with line break along with "br" tag with BeautifulSoup. html = "<td class="s4 softmerge" dir="ltr"><div class="softmerge-inner" style="width: 5524px; left: -1px;">But when he saw many of the Pharisees and Sadducees come to his baptism, he said unto them, <br/>O generation of vipers, who hath warned you to flee from the wrath to come?<br/>Bring forth therefore fruits meet for repentance:<br/>And think not to say within yourselves, We have Abraham to our father: for I

Extract text with line break in BeautifulSoup

折月煮酒 提交于 2020-02-25 04:07:58
问题 I'd like to extract text with line break along with "br" tag with BeautifulSoup. html = "<td class="s4 softmerge" dir="ltr"><div class="softmerge-inner" style="width: 5524px; left: -1px;">But when he saw many of the Pharisees and Sadducees come to his baptism, he said unto them, <br/>O generation of vipers, who hath warned you to flee from the wrath to come?<br/>Bring forth therefore fruits meet for repentance:<br/>And think not to say within yourselves, We have Abraham to our father: for I

Webscraping Instagram follower count BeautifulSoup

丶灬走出姿态 提交于 2020-02-24 08:45:32
问题 I'm just starting to learn how to web scrape using BeautifulSoup and want to write a simple program that will get the follower count for a given Instagram page. I currently have the following script (pulled from another Q&A thread): import requests from bs4 import BeautifulSoup user = "espn" url = 'https://www.instagram.com/'+ user r = requests.get(url) soup = BeautifulSoup(r.content) followers = soup.find('meta', {'name': 'description'})['content'] follower_count = followers.split('Followers

Webscraping Instagram follower count BeautifulSoup

人走茶凉 提交于 2020-02-24 08:45:26
问题 I'm just starting to learn how to web scrape using BeautifulSoup and want to write a simple program that will get the follower count for a given Instagram page. I currently have the following script (pulled from another Q&A thread): import requests from bs4 import BeautifulSoup user = "espn" url = 'https://www.instagram.com/'+ user r = requests.get(url) soup = BeautifulSoup(r.content) followers = soup.find('meta', {'name': 'description'})['content'] follower_count = followers.split('Followers

Webscraping Instagram follower count BeautifulSoup

偶尔善良 提交于 2020-02-24 08:45:13
问题 I'm just starting to learn how to web scrape using BeautifulSoup and want to write a simple program that will get the follower count for a given Instagram page. I currently have the following script (pulled from another Q&A thread): import requests from bs4 import BeautifulSoup user = "espn" url = 'https://www.instagram.com/'+ user r = requests.get(url) soup = BeautifulSoup(r.content) followers = soup.find('meta', {'name': 'description'})['content'] follower_count = followers.split('Followers

Unable to modify few fields in a webpage issuing a post request

青春壹個敷衍的年華 提交于 2020-02-15 08:21:49
问题 I've created a script in python using requests module in combination with BeautifulSoup library to fill in some tiny forms traversing different pages in a webpage. There are multiple get and post requests I need to issue to accomplish this as selenium is not an option here. I'm only interested in modifying the fields in step 2 captioned as personal information . How to do it - After logging in using the email and password (available within the script) it is necessary to choose (by default yes

Deleting a div with a particlular class using BeautifulSoup

霸气de小男生 提交于 2020-02-12 08:49:41
问题 I want to delete the specific div from soup object. I am using python 2.7 and bs4 . According to documentation we can use div.decompose() . But that would delete all the div . How can I delete a div with specific class? 回答1: Sure, you can just select, find, or find_all the div s of interest in the usual way, and then call decompose() on those divs. For instance, if you want to remove all divs with class sidebar , you could do that with # replace with `soup.findAll` if you are using

Deleting a div with a particlular class using BeautifulSoup

心不动则不痛 提交于 2020-02-12 08:49:05
问题 I want to delete the specific div from soup object. I am using python 2.7 and bs4 . According to documentation we can use div.decompose() . But that would delete all the div . How can I delete a div with specific class? 回答1: Sure, you can just select, find, or find_all the div s of interest in the usual way, and then call decompose() on those divs. For instance, if you want to remove all divs with class sidebar , you could do that with # replace with `soup.findAll` if you are using

Why does this code generate multiple files? I want 1 file with all entries in it

扶醉桌前 提交于 2020-02-06 17:52:30
问题 Im trying to work with both beautifulsoup and xpath and was trying to using the following code, but now im getting 1 file per URL instead of before where i was getting 1 file for all the URLS I just moved over the reading from CSV to get the list of urls and also just added the parsing of the url and response.. but when i run this now i get alot of individual files and in some cases 1 file may actually contain 2 scraped pages data.. so do i need to move my file saving out (indent) import