beautifulsoup | 易学教程

Beautiful Soup 'ResultSet' object has no attribute 'text'

阅读更多关于 Beautiful Soup 'ResultSet' object has no attribute 'text'

问题 from bs4 import BeautifulSoup import urllib.request import win_unicode_console win_unicode_console.enable() link = ('https://pietroalbini.io/') req = urllib.request.Request(link, headers={'User-Agent': 'Mozilla/5.0'}) url = urllib.request.urlopen(req).read() soup = BeautifulSoup(url, "html.parser") body = soup.find_all('div', {"class":"wrapper"}) print(body.text) Hi, I have a problem with Beautiful Soup, if I run this code without ".text" at the end it show me a list of div but if I add "

Make BeautifulSoup handle line breaks as a browser would

阅读更多关于 Make BeautifulSoup handle line breaks as a browser would

问题 I'm using BeautifulSoup (version '4.3.2' with Python 3.4) to convert html documents to text. The problem I'm having is that sometimes web pages have newline characters "\n" that wouldn't actually get rendered as a new line in a browser, but when BeautifulSoup converts them to text, it leaves in the "\n". Example: Your browser probably renders the following all in one line (even though have a newline character in the middle): This is a paragraph. And your browser probably renders the following

How to get all listings urls from main page with python web scraping

阅读更多关于 How to get all listings urls from main page with python web scraping

问题 I wrote a code for web scraping, My code is ok just except two issues. From detail page, everything is ok just ISBN NO, and from main page, I need all listing URLs so that my code could scrape date from aa listings. Please guide me how can I fix this issue. Both(main page and details page )URLs are in the code. Thank you! here is my code: import requests from bs4 import BeautifulSoup import csv def get_page(url): response = requests.get(url) if not response.ok: print('server responded:',

Web scraping program cannot find element which I can see in the browser

阅读更多关于 Web scraping program cannot find element which I can see in the browser

问题 I am trying to get the titles of the streams on https://www.twitch.tv/directory/game/Dota%202, using Requests and BeautifulSoup. I know that my search criteria are correct, yet my program does not find the elements I need. Here is a screenshot showing the relevant part of the source code in the browser: The HTML source as text: <div class="tw-media-card-meta__title"> <div class="tw-c-text-alt"> <a class="tw-full-width tw-interactive tw-link tw-link--button tw-link--hover-underline-none tw

Log in to a problematic site using requests

阅读更多关于 Log in to a problematic site using requests

问题 I'm trying to create a script in python using requests module to log in to thissite I'm using my credentials but I don't find any way to do so as I can't see the parameters (in chrome dev tools) required to send along with requests. username: SIMMTH.iqbal_123 password: SShift_123 The login form looks like this . This is my initial attempt (I really could not find anything in that page to start with): import requests from bs4 import BeautifulSoup link = "https://jobs.allianz.com/sap/bc/bsp/sap

Log in to a problematic site using requests

阅读更多关于 Log in to a problematic site using requests

How to scrape multiple result having same tags and class

阅读更多关于 How to scrape multiple result having same tags and class

问题 My code is accurate for single page but when I run this code for multiple records using for loop and if there are some data missing like person then (as I used index no[1] and [2] for person variable ,location, phone no and cell no but if there are something missing like person name is missing) next record will be extracted at person variable. Could you please fix this issue? here is my code: import requests from bs4 import BeautifulSoup import re def get_page(url): response = requests.get

How to scrape multiple result having same tags and class

阅读更多关于 How to scrape multiple result having same tags and class

BeautifulSoup cannot locate table with specific class [duplicate]

阅读更多关于 BeautifulSoup cannot locate table with specific class [duplicate]

问题 This question already has answers here : Beautiful Soup: 'ResultSet' object has no attribute 'find_all'? (3 answers) Closed 3 days ago . Essentially, I am attempting to extract the text from the table with the given class title below. I have the rest of the code already written that extracts the text from each of the rows, so I do not need any assistance with that aspect. I just cannot seem to figure out why I am receiving this error: "ResultSet object has no attribute '%s'. You're probably

Read Time out when attempting to request a page

阅读更多关于 Read Time out when attempting to request a page

问题 I am attempting to scrape websites and I sometimes get this error and it is concerning as I randomly get this error but after i retry i do not get the error. requests.exceptions.ReadTimeout: HTTPSConnectionPool(host='www.somewebsite.com', port=443): Read timed out. (read timeout=None) My code looks like the following from bs4 import BeautifulSoup from random_user_agent.user_agent import UserAgent from random_user_agent.params import SoftwareName, OperatingSystem import requests software_names