beautifulsoup | 易学教程

find specific text in beautifulsoup

阅读更多关于 find specific text in beautifulsoup

问题 I have a specific piece of text i'm trying to get using BeautifulSoup and Python, however I am not sure how to get it using sou.find(). I am trying to obtain "#1 in Beauty" only from the following. <ul> <li>...<li> <li>...<li> <li id="salesRank"> <b>Amazon Best Sellers Rank:</b> "#1 in Beauty (" <a href="http://www.amazon.com/gp/bestsellers/beauty/ref=pd_dp_ts_k_1"> See top 100</a> ") Can anyone help me with this? 回答1: You need to use the find_all method of soup . Try below import urllib,

find specific text in beautifulsoup

阅读更多关于 find specific text in beautifulsoup

Webscrape Multiple Pages with python - output issue

阅读更多关于 Webscrape Multiple Pages with python - output issue

问题 Happy new year python community, I am trying to extract a table from website using Python Beautifulsoup4 I am struggling to see the results in my output files. The code run smoothly but nothing is written the file. My code below from bs4 import BeautifulSoup as bsoup import requests as rq import re base_url = 'http://www.creationdentreprise.sn/rechercher-une-societe?field_rc_societe_value=&field_ninea_societe_value=&denomination=&field_localite_nid=All&field_siege_societe_value=&field_forme

Webscrape Multiple Pages with python - output issue

阅读更多关于 Webscrape Multiple Pages with python - output issue

BeautifulSoup not fetching the Data

阅读更多关于 BeautifulSoup not fetching the Data

问题 I am trying to fetch the data from the website. But not getting any of the information for fields like name, Nature of business, Telephone, Email, etc. in the variable soup. What should I add to the below code to have this data? import requests import pandas as pd from bs4 import BeautifulSoup page = "http://www.pmas.sg/page/members-directory" pages = requests.get(page) soup = BeautifulSoup(pages.content, 'html.parser') print(soup) The output I am getting using the above code is:- <!DOCTYPE

Webscraping with Python, I can't see the actual names of classes when I say inspect page

阅读更多关于 Webscraping with Python, I can't see the actual names of classes when I say inspect page

问题 Ok so I am just learning python and I want to use web scraping. I was watching this tutorial and there the tutor has a totally different "inspect" page(or whatever it is called) than mine. So what he sees is class = "ProfileHeaderCard", and what I see is class = "css-1dbjc4n r-1iusvr4 r-16y2uox r-5f2r5o r-m611by". THE IMPORTANT PART is that BeautifulSoup library does not work when I use my version of the class name but it works when I use his version. When I say print(soup.find('div', {"class

How to crawl for specific links inside a website?

阅读更多关于 How to crawl for specific links inside a website?

问题 I have sucessfully crawled the Headline and the Links . I would like to replace the Summary tab with The Main Article from the link (Since the Title and Summary are same anyways. ) link = "https://www.vanglaini.org" + article.a['href'] (eg. https://www.vanglaini.org/tualchhung/103834) Please help me modify my code. Below is my code. import pandas as pd import requests from bs4 import BeautifulSoup source = requests.get('https://www.vanglaini.org/').text soup = BeautifulSoup(source, 'lxml')

using beautifulsoup 4 for xml causes strange behaviour (memory issues?)

阅读更多关于 using beautifulsoup 4 for xml causes strange behaviour (memory issues?)

问题 I'm getting strange behaviour with this >>> from bs4 import BeautifulSoup >>> smallfile = 'small.xml' #approx 600bytes >>> largerfile = 'larger.xml' #approx 2300 bytes >>> len(BeautifulSoup(open(smallfile, 'r'), ['lxml', 'xml'])) 1 >>> len(BeautifulSoup(open(largerfile, 'r'), ['lxml', 'xml'])) 0 Contents of small.xml: <?xml version="1.0" encoding="us-ascii"?> <Catalog> <CMoverMissile id="HunterSeekerMissile"> <MotionPhases index="1"> <Driver value="Guidance"/> <Acceleration value="3200"/>

How can i crawl web data that not in tags

阅读更多关于 How can i crawl web data that not in tags

问题 <div id="main-content" class="content"> <div class="metaline"> <span class="article-meta author">jorden</span> </div> " 1.name:jorden> 2.age:28 -- " <span class="D2"> from 111.111.111.111 </span> </div> I only need 1.name:jorden 2.age:28 xxx.select('#main-content') this will return all things, but i only need part of them. Because they are not in any tags, i don't know how to do. 回答1: You want to find the tag before the text in question (in your case, <div class="metaline"> ) and then look at

How to extract the strong elements which are in div tag

阅读更多关于 How to extract the strong elements which are in div tag

问题 I am new to web scraping. I am using Python to scrape the data. Can someone help me in how to extract data from: <div class="dept"><strong>LENGTH:</strong> 15 credits</div> My output should be LENGTH: 15 credits Here is my code: from urllib.request import urlopen from bs4 import BeautifulSoup length=bsObj.findAll("strong") for leng in length: print(leng.text,leng.next_sibling) Output: DELIVERY: Campus LENGTH: 2 years OFFERED BY: Olin Business School but I would like to have only LENGTH.