beautifulsoup

Python Beautifulsoup Getting Attribute Value

北城余情 提交于 2020-01-04 08:35:07
问题 I'm having difficulty getting the proper syntax to extract the value of an attribute in Beautifulsoup with HTML 5.0. So I've isolated the occurrence of a tag in my soup using the proper syntax where there is an HTML 5 issue: tags = soup.find_all(attrs={"data-topic":"recUpgrade"}) Taking just tags[1]: date = tags[1].find(attrs={"data-datenews":True}) and date here is: <span class="invisible" data-datenews="2018-05-25 06:02:19" data-idnews="2736625" id="horaCompleta"></span> But now I want to

BeautifulSoup - find_all div tags with different class name

ぃ、小莉子 提交于 2020-01-04 07:42:38
问题 I want to select all <div> where class name is either post has-profile bg2 OR post has-profile bg1 but not last one i.e. panel <div id="6" class="post has-profile bg2"> some text 1 </div> <div id="7" class="post has-profile bg1"> some text 2 </div> <div id="8" class="post has-profile bg2"> some text 3 </div> <div id="9" class="post has-profile bg1"> some text 4 </div> <div class="panel bg1" id="abc"> ... </div> select() is matching only single occurrence. I'm trying it with find_all() , but

BeautifulSoup not extracting all html

只谈情不闲聊 提交于 2020-01-04 06:26:14
问题 We are trying to get product urls from this page of Forever 21's site (http://www.forever21.com/Product/Category.aspx?br=f21&category=dress&pagesize=100&page=1). For some reason, BeautifulSoup is not getting the elements with class "item_pic", even though they are in the site html. We have tried using requests, mechanize, selenium, and are having no luck. All the commented code is from previous attempts to get the html (none of which worked). Here is our code: from bs4 import BeautifulSoup

Prevent BeautifulSoup's renderContents() from changing   to Â

家住魔仙堡 提交于 2020-01-04 06:15:44
问题 I'm using bs4 to do some work on some text, but in some cases it converts   characters to  . The best I can tell is that this is an encoding mismatch from UTF-8 to latin1 (or reverse?) Everything in my web app is UTF-8, Python3 is UTF-8, and I've confirmed the database is UTF-8. I've narrowed down the problem to this one line: print("Before soup: " + text) # Before soup:   soup = BeautifulSoup(text, "html.parser") #.... do stuff to soup, but all commented out for this testing. soup =

Chinese character encoding error with BeautifulSoup in Python?

心不动则不痛 提交于 2020-01-04 05:23:07
问题 I'd like to use BeatifulSoup to get the data in a table from a website, but it couldn't grab the Chinese character correctly. This is my code: #!/usr/bin/env python # -*- coding: utf-8 -*- import urllib2 from bs4 import BeautifulSoup html=urllib2.urlopen("http://www.515fa.com/che_1978.html").read() soup=BeautifulSoup(html,from_encoding="UTF-8") print soup.prettify() And the Chinese characters are displayed like this: <td align="center" bgcolor="#FFFFFF" u1:str="" width="173"> ćé¸</td> <td

Error logging into instagram with python

孤人 提交于 2020-01-04 05:10:46
问题 I am trying to log into my instagram via a python script using argparse. It seems to connect but it prints out " This page could not be loaded. If you have cookies disabled in your browser, oryou are browsing in Private Mode, please try enabling cookies or turning off Private Mode, and then retrying your action. " Here's my code: import argparse import mechanicalsoup from bs4 import BeautifulSoup parser = argparse.ArgumentParser(description='Login to Instagram.') parser.add_argument("username

Using/importing Beautiful Soup 4 without installation

余生长醉 提交于 2020-01-04 01:53:07
问题 As the Beautiful Soup documentation says: If all else fails, the license for Beautiful Soup allows you to package the entire library with your application. You can download the tarball, copy its bs4 directory into your application’s codebase, and use Beautiful Soup without installing it at all. This is exactly what I want, and what I've done... up to the point of using it in my code. I don't know how to import Beautiful Soup 4. Unlike v3, there's no standalone BeautifulSoup.py , just that bs4

Scraping all mobiles of Flipkart.com

这一生的挚爱 提交于 2020-01-03 19:34:35
问题 I am trying to scrape all the mobiles from www.flipkart.com. Now, what I have thought of doing is that I can scrape all mobiles from here. http://www.flipkart.com/mobiles/pr?p[]=sort%3Dprice_asc&sid=tyy%2C4io&layout=grid Now, the problem is that, in this website I have to press ' show more results ' to see more results. But, how can I do this using code? I am using BeautifulSoup package in python. My code till now: import bs4 import re import urllib2 import sys link = 'http://www.flipkart.com

How do I use BeautifulSoup4 to get ALL text before <br> tag

不打扰是莪最后的温柔 提交于 2020-01-03 17:12:31
问题 I'm trying to scrape some data for my app. My question is I need some Here is the HTML code: <tr> <td> This <a class="tip info" href="blablablablabla">is a first</a> sentence. <br> This <a class="tip info" href="blablablablabla">is a second</a> sentence. <br>This <a class="tip info" href="blablablablabla">is a third</a> sentence. <br> </td> </tr> I want output to looks like This is a first sentence. This is a second sentence. This is a third sentence. Is it possible to do that? 回答1: Try this.

beautiful soup findall multiple class using one query

二次信任 提交于 2020-01-03 16:53:20
问题 I searched thoroughly for solution on many websites and on here but none of them works! I am trying to scrape flashscores.com and i want to parse a <td> with the class name cell_ab team-home or cell_ab team-home bold I tried using re soup.find_all('td', { 'class'= re.compile(r"^(cell_ab team-home |cell_ab team-home bold )$")) and soup.find_all('td', { 'class' : ['cell_ab team-home ','cell_ab team-home bold ']) neither of them works. someone requested for the codes so here it is from tkinter