beautifulsoup

Beautiful Soup 'ResultSet' object has no attribute 'text'

本秂侑毒 提交于 2020-04-07 03:00:43
问题 from bs4 import BeautifulSoup import urllib.request import win_unicode_console win_unicode_console.enable() link = ('https://pietroalbini.io/') req = urllib.request.Request(link, headers={'User-Agent': 'Mozilla/5.0'}) url = urllib.request.urlopen(req).read() soup = BeautifulSoup(url, "html.parser") body = soup.find_all('div', {"class":"wrapper"}) print(body.text) Hi, I have a problem with Beautiful Soup, if I run this code without ".text" at the end it show me a list of div but if I add "

Make BeautifulSoup handle line breaks as a browser would

亡梦爱人 提交于 2020-04-07 02:58:25
问题 I'm using BeautifulSoup (version '4.3.2' with Python 3.4) to convert html documents to text. The problem I'm having is that sometimes web pages have newline characters "\n" that wouldn't actually get rendered as a new line in a browser, but when BeautifulSoup converts them to text, it leaves in the "\n". Example: Your browser probably renders the following all in one line (even though have a newline character in the middle): This is a paragraph. And your browser probably renders the following

How to get all listings urls from main page with python web scraping

扶醉桌前 提交于 2020-04-07 01:06:14
问题 I wrote a code for web scraping, My code is ok just except two issues. From detail page, everything is ok just ISBN NO, and from main page, I need all listing URLs so that my code could scrape date from aa listings. Please guide me how can I fix this issue. Both(main page and details page )URLs are in the code. Thank you! here is my code: import requests from bs4 import BeautifulSoup import csv def get_page(url): response = requests.get(url) if not response.ok: print('server responded:',

Web scraping program cannot find element which I can see in the browser

北战南征 提交于 2020-04-06 21:44:07
问题 I am trying to get the titles of the streams on https://www.twitch.tv/directory/game/Dota%202, using Requests and BeautifulSoup. I know that my search criteria are correct, yet my program does not find the elements I need. Here is a screenshot showing the relevant part of the source code in the browser: The HTML source as text: <div class="tw-media-card-meta__title"> <div class="tw-c-text-alt"> <a class="tw-full-width tw-interactive tw-link tw-link--button tw-link--hover-underline-none tw

Log in to a problematic site using requests

和自甴很熟 提交于 2020-04-06 21:43:37
问题 I'm trying to create a script in python using requests module to log in to thissite I'm using my credentials but I don't find any way to do so as I can't see the parameters (in chrome dev tools) required to send along with requests. username: SIMMTH.iqbal_123 password: SShift_123 The login form looks like this . This is my initial attempt (I really could not find anything in that page to start with): import requests from bs4 import BeautifulSoup link = "https://jobs.allianz.com/sap/bc/bsp/sap

Log in to a problematic site using requests

混江龙づ霸主 提交于 2020-04-06 21:42:23
问题 I'm trying to create a script in python using requests module to log in to thissite I'm using my credentials but I don't find any way to do so as I can't see the parameters (in chrome dev tools) required to send along with requests. username: SIMMTH.iqbal_123 password: SShift_123 The login form looks like this . This is my initial attempt (I really could not find anything in that page to start with): import requests from bs4 import BeautifulSoup link = "https://jobs.allianz.com/sap/bc/bsp/sap

How to scrape multiple result having same tags and class

核能气质少年 提交于 2020-03-28 06:41:46
问题 My code is accurate for single page but when I run this code for multiple records using for loop and if there are some data missing like person then (as I used index no[1] and [2] for person variable ,location, phone no and cell no but if there are something missing like person name is missing) next record will be extracted at person variable. Could you please fix this issue? here is my code: import requests from bs4 import BeautifulSoup import re def get_page(url): response = requests.get

How to scrape multiple result having same tags and class

有些话、适合烂在心里 提交于 2020-03-28 06:41:23
问题 My code is accurate for single page but when I run this code for multiple records using for loop and if there are some data missing like person then (as I used index no[1] and [2] for person variable ,location, phone no and cell no but if there are something missing like person name is missing) next record will be extracted at person variable. Could you please fix this issue? here is my code: import requests from bs4 import BeautifulSoup import re def get_page(url): response = requests.get

BeautifulSoup cannot locate table with specific class [duplicate]

大城市里の小女人 提交于 2020-03-27 08:44:12
问题 This question already has answers here : Beautiful Soup: 'ResultSet' object has no attribute 'find_all'? (3 answers) Closed 3 days ago . Essentially, I am attempting to extract the text from the table with the given class title below. I have the rest of the code already written that extracts the text from each of the rows, so I do not need any assistance with that aspect. I just cannot seem to figure out why I am receiving this error: "ResultSet object has no attribute '%s'. You're probably

Read Time out when attempting to request a page

拈花ヽ惹草 提交于 2020-03-26 04:03:39
问题 I am attempting to scrape websites and I sometimes get this error and it is concerning as I randomly get this error but after i retry i do not get the error. requests.exceptions.ReadTimeout: HTTPSConnectionPool(host='www.somewebsite.com', port=443): Read timed out. (read timeout=None) My code looks like the following from bs4 import BeautifulSoup from random_user_agent.user_agent import UserAgent from random_user_agent.params import SoftwareName, OperatingSystem import requests software_names