beautifulsoup | 易学教程

Python Beautifulsoup Getting Attribute Value

阅读更多关于 Python Beautifulsoup Getting Attribute Value

问题 I'm having difficulty getting the proper syntax to extract the value of an attribute in Beautifulsoup with HTML 5.0. So I've isolated the occurrence of a tag in my soup using the proper syntax where there is an HTML 5 issue: tags = soup.find_all(attrs={"data-topic":"recUpgrade"}) Taking just tags[1]: date = tags[1].find(attrs={"data-datenews":True}) and date here is: <span class="invisible" data-datenews="2018-05-25 06:02:19" data-idnews="2736625" id="horaCompleta"></span> But now I want to

BeautifulSoup - find_all div tags with different class name

阅读更多关于 BeautifulSoup - find_all div tags with different class name

问题 I want to select all <div> where class name is either post has-profile bg2 OR post has-profile bg1 but not last one i.e. panel <div id="6" class="post has-profile bg2"> some text 1 </div> <div id="7" class="post has-profile bg1"> some text 2 </div> <div id="8" class="post has-profile bg2"> some text 3 </div> <div id="9" class="post has-profile bg1"> some text 4 </div> <div class="panel bg1" id="abc"> ... </div> select() is matching only single occurrence. I'm trying it with find_all() , but

BeautifulSoup not extracting all html

阅读更多关于 BeautifulSoup not extracting all html

问题 We are trying to get product urls from this page of Forever 21's site (http://www.forever21.com/Product/Category.aspx?br=f21&category=dress&pagesize=100&page=1). For some reason, BeautifulSoup is not getting the elements with class "item_pic", even though they are in the site html. We have tried using requests, mechanize, selenium, and are having no luck. All the commented code is from previous attempts to get the html (none of which worked). Here is our code: from bs4 import BeautifulSoup

Prevent BeautifulSoup's renderContents() from changing to Â

阅读更多关于 Prevent BeautifulSoup's renderContents() from changing to Â

问题 I'm using bs4 to do some work on some text, but in some cases it converts characters to Â . The best I can tell is that this is an encoding mismatch from UTF-8 to latin1 (or reverse?) Everything in my web app is UTF-8, Python3 is UTF-8, and I've confirmed the database is UTF-8. I've narrowed down the problem to this one line: print("Before soup: " + text) # Before soup: soup = BeautifulSoup(text, "html.parser") #.... do stuff to soup, but all commented out for this testing. soup =

Chinese character encoding error with BeautifulSoup in Python?

阅读更多关于 Chinese character encoding error with BeautifulSoup in Python?

问题 I'd like to use BeatifulSoup to get the data in a table from a website, but it couldn't grab the Chinese character correctly. This is my code: #!/usr/bin/env python # -*- coding: utf-8 -*- import urllib2 from bs4 import BeautifulSoup html=urllib2.urlopen("http://www.515fa.com/che_1978.html").read() soup=BeautifulSoup(html,from_encoding="UTF-8") print soup.prettify() And the Chinese characters are displayed like this: <td align="center" bgcolor="#FFFFFF" u1:str="" width="173"> ćé¸</td> <td

Error logging into instagram with python

阅读更多关于 Error logging into instagram with python

问题 I am trying to log into my instagram via a python script using argparse. It seems to connect but it prints out " This page could not be loaded. If you have cookies disabled in your browser, oryou are browsing in Private Mode, please try enabling cookies or turning off Private Mode, and then retrying your action. " Here's my code: import argparse import mechanicalsoup from bs4 import BeautifulSoup parser = argparse.ArgumentParser(description='Login to Instagram.') parser.add_argument("username

Using/importing Beautiful Soup 4 without installation

阅读更多关于 Using/importing Beautiful Soup 4 without installation

问题 As the Beautiful Soup documentation says: If all else fails, the license for Beautiful Soup allows you to package the entire library with your application. You can download the tarball, copy its bs4 directory into your application’s codebase, and use Beautiful Soup without installing it at all. This is exactly what I want, and what I've done... up to the point of using it in my code. I don't know how to import Beautiful Soup 4. Unlike v3, there's no standalone BeautifulSoup.py , just that bs4

Scraping all mobiles of Flipkart.com

阅读更多关于 Scraping all mobiles of Flipkart.com

问题 I am trying to scrape all the mobiles from www.flipkart.com. Now, what I have thought of doing is that I can scrape all mobiles from here. http://www.flipkart.com/mobiles/pr?p[]=sort%3Dprice_asc&sid=tyy%2C4io&layout=grid Now, the problem is that, in this website I have to press ' show more results ' to see more results. But, how can I do this using code? I am using BeautifulSoup package in python. My code till now: import bs4 import re import urllib2 import sys link = 'http://www.flipkart.com

How do I use BeautifulSoup4 to get ALL text before <br> tag

阅读更多关于 How do I use BeautifulSoup4 to get ALL text before tag

问题 I'm trying to scrape some data for my app. My question is I need some Here is the HTML code: <tr> <td> This <a class="tip info" href="blablablablabla">is a first</a> sentence. <br> This <a class="tip info" href="blablablablabla">is a second</a> sentence. <br>This <a class="tip info" href="blablablablabla">is a third</a> sentence. <br> </td> </tr> I want output to looks like This is a first sentence. This is a second sentence. This is a third sentence. Is it possible to do that? 回答1: Try this.

beautiful soup findall multiple class using one query

阅读更多关于 beautiful soup findall multiple class using one query

问题 I searched thoroughly for solution on many websites and on here but none of them works! I am trying to scrape flashscores.com and i want to parse a <td> with the class name cell_ab team-home or cell_ab team-home bold I tried using re soup.find_all('td', { 'class'= re.compile(r"^(cell_ab team-home |cell_ab team-home bold )$")) and soup.find_all('td', { 'class' : ['cell_ab team-home ','cell_ab team-home bold ']) neither of them works. someone requested for the codes so here it is from tkinter