beautifulsoup

Why is BeautifulSoup's findAll returning an empty list when I search by class?

二次信任 提交于 2021-02-08 03:12:01
问题 I am trying to web-scrape using an h2 tag, but BeautifulSoup returns an empty list. <h2 class="iCIMS_InfoMsg iCIMS_InfoField_Job"> html=urlopen("https://careersus-endologix.icims.com/jobs/2034/associate-supplier-quality-engineer/job") bs0bj=BeautifulSoup(html,"lxml") nameList=bs0bj.findAll("h2",{"class":"iCIMS_InfoMsg iCIMS_InfoField_Job"}) print(nameList) 回答1: The content is inside an iframe and updated via js (so not present in initial request). You can use the same link the page is using

Why is BeautifulSoup's findAll returning an empty list when I search by class?

戏子无情 提交于 2021-02-08 03:10:20
问题 I am trying to web-scrape using an h2 tag, but BeautifulSoup returns an empty list. <h2 class="iCIMS_InfoMsg iCIMS_InfoField_Job"> html=urlopen("https://careersus-endologix.icims.com/jobs/2034/associate-supplier-quality-engineer/job") bs0bj=BeautifulSoup(html,"lxml") nameList=bs0bj.findAll("h2",{"class":"iCIMS_InfoMsg iCIMS_InfoField_Job"}) print(nameList) 回答1: The content is inside an iframe and updated via js (so not present in initial request). You can use the same link the page is using

How do I get all text from within this tag?

我是研究僧i 提交于 2021-02-07 22:39:18
问题 I'm trying to get all text from within this HTML tag, which I store in variable tag : <td rowspan="2" style="text-align: center;"><a href="/wiki/Glenn_Miller" title="Glenn Miller">Glenn Miller</a> & His Orchestra</td> The result should be "Glenn Miller & His Orchestra" . But print ing tag.find(text=True) returns this: "Glenn Miller" . How do I get the rest of the text within the td element? 回答1: tag.find(text=True) would return the first matching text node . Use .get_text() instead: >>> from

How do I get all text from within this tag?

我的未来我决定 提交于 2021-02-07 22:37:11
问题 I'm trying to get all text from within this HTML tag, which I store in variable tag : <td rowspan="2" style="text-align: center;"><a href="/wiki/Glenn_Miller" title="Glenn Miller">Glenn Miller</a> & His Orchestra</td> The result should be "Glenn Miller & His Orchestra" . But print ing tag.find(text=True) returns this: "Glenn Miller" . How do I get the rest of the text within the td element? 回答1: tag.find(text=True) would return the first matching text node . Use .get_text() instead: >>> from

Split an element with BeautifulSoup

人走茶凉 提交于 2021-02-07 19:28:27
问题 I have some html code that I'm parsing with BeautifulSoup. One of the requirements is that tags are not nested in paragraphs or other text tags. For example if I have a code like this: <p> first text <a href="..."> <img .../> </a> second text </p> I need to transform it into something like this: <p>first text</p> <img .../> <p>second text</p> I have done something to extract the images and add them after the paragraph, like this: for match in soup.body.find_all(True, recursive=False): try:

How to use Beautiful Soup to find a tag with changing id?

别来无恙 提交于 2021-02-07 18:34:38
问题 I am using Beautiful Soup in Python. Here is an example URL: http://www.locationary.com/place/en/US/Ohio/Middletown/McDonald%27s-p1013254580.jsp In the HTML, there are a bunch of tags and the only way I can specify which ones to find is with their id. The only thing I want to find is the telephone number. The tag looks like this: <td class="dispTxt" id="value_xxx_c_1_f_8_a_134242498">5134231582</td> I have gone to other URLs on the same website and found almost the same id for the telephone

How to set value in with Beautiful Soup in some HTML element if I know id of that element or class?

感情迁移 提交于 2021-02-07 11:42:13
问题 How to set value with Beautiful Soup in some element if I know id of that HTML element or class ? For example I have <td id="test"></td > and I want to set text RESTORE... like <td id="test">RESTORE...</td> . 回答1: Find the tag you want to modify using a find() search for id=test . Then: BeautifulSoup Documentation - "Modifying the tree" Modifying .string If you set a tag’s .string attribute, the tag’s contents are replaced with the string you give: markup = '<a href="http://example.com/">I

Scraping images using beautiful soup

这一生的挚爱 提交于 2021-02-07 10:53:47
问题 I'm trying to scrape the image from an article using beautiful soup. It seems to work but I can't open the image. I get a file format error every time I try to access the image from my desktop. Any insights? timestamp = time.asctime() # Parse HTML of article, aka making soup soup = BeautifulSoup(urllib2.urlopen(url).read()) # Create a new file to write content to txt = open('%s.jpg' % timestamp, "wb") # Scrape article main img links = soup.find('figure').find_all('img', src=True) for link in

Web-scrapeing a table to a list

感情迁移 提交于 2021-02-07 10:39:13
问题 I'm trying to extract a table from a webpage. I have managed to get all the data in the table into a list. However all the table data is being put into one list element. I need assistance getting the 'clean' data (i.e. the strings, without all the HTML packaging) from the rows of the table into their own list elements. So instead of... list = [<tr> <th><a href="/7.62x25mm_TT_AKBS" title="7.62x25mm TT AKBS"><img alt="TTAKBS.png" decoding="async" height="64" src="https://static.wikia.nocookie

How should I show results of BeautifulSoup parsing in Django?

倖福魔咒の 提交于 2021-02-07 10:17:29
问题 I'm trying to scrape a web page using BeautifulSoup and Django. Here's my views.py which do this task: def detail(request, article_id): article = get_object_or_404(Article, pk=article_id) html = urllib2.urlopen("...url...") soup = BeautifulSoup(html) title = soup.title return render(request, 'detail.html', {'article': article, 'title':title}) But when I use {{ title }} in django template files, it doesn't show anything. I've test it and it works in shell. I've added a line to this function: