bs4 | 易学教程

bs4.FeatureNotFound: Couldn't find a tree builder with the features you requested: lxml [duplicate]

阅读更多关于 bs4.FeatureNotFound: Couldn't find a tree builder with the features you requested: lxml [duplicate]

问题 This question already has answers here : beautifulsoup won't recognize lxml (2 answers) Closed 3 years ago . Can you please suggest a fix? It almost download all the images from imgur pages with one single image not sure why it is not working in this case and how to fix it? elif 'imgur.com' in submission.url and not (submission.url.endswith('gif') or submission.url.endswith('webm') or submission.url.endswith('mp4') or 'all' in submission.url or '#' in submission.url or '/a/' in submission.url

bytes object has no attribute find_all

阅读更多关于 bytes object has no attribute find_all

问题 I've been trying for the last 3 hours to scrape this website and get the rank, name, wins, and losses of each team. When implementing this code: import requests from bs4 import BeautifulSoup halo = requests.get("https://www.halowaypoint.com/en-us/esports/standings") page = BeautifulSoup(halo.content, "html.parser") final = page.encode('utf-8') print(final.find_all("div")) I keep getting this error If anyone can help me out then it would be much appreciated! Thanks! 回答1: You are calling the

How to input values and click button with Requests?

阅读更多关于 How to input values and click button with Requests?

问题 With the requests module i eventually want to download a song. if you head to youtube-mp3.org, there is one input bar and one convert button. Shortly after the convert is finished there is a download button. Now i want to go throught the process with my python script. so far i have this: def download_song(song_name): import requests with requests.Session() as c: url = r"http://www.youtube-mp3.org/" c.get(url) it barely anything... i have tried to check the documentation on there website. i

Get authors name and URL for tag from google scholar

阅读更多关于 Get authors name and URL for tag from google scholar

问题 I wish to write to a CSV file a list of all authors with their URL to a CSV file who class themselves as a specific tag on Google Scholar. For example, if we were to take 'security' I would want this output: author url Howon Kim https://scholar.google.pl/citations?user=YUoJP-oAAAAJ&hl=pl Adrian Perrig https://scholar.google.pl/citations?user=n-Oret4AAAAJ&hl=pl ... ... I have written this code which prints each author's name # -*- coding: utf-8 -*- import urllib.request import csv from bs4

Using BeautifulSoup4 with Google Translate

阅读更多关于 Using BeautifulSoup4 with Google Translate

问题 I am currently going through the Web Scraping section of AutomateTheBoringStuff and trying to write a script that extracts translated words from Google Translate using BeautifulSoup4. I inspected the html content of a page where 'Explanation' is the translated word: <span id="result_box" class="short_text" lang="en"> <span class>Explanation</span> </span> Using BeautifulSoup4, I tried different selectors but nothing would return the translated word. Here are a few examples I tried, but they

How do I avoid data from different tabs to be concatenated in one cell when I scrape a table?

阅读更多关于 How do I avoid data from different tabs to be concatenated in one cell when I scrape a table?

问题 I scraped this page https://www.capfriendly.com/teams/bruins, specifically looking for the tables under the tab Cap Hit (Fowards, Defense, GoalTenders). I used Python and BeautifulSoup4 and CSV as the output format. import requests, bs4 r = requests.get('https://www.capfriendly.com/teams/bruins') soup = bs4.BeautifulSoup(r.text, 'lxml') table = soup.find(id="team") with open("csvfile.csv", "w", newline='') as team_data: for tr in table('tr', class_=['odd', 'even']): # get all tr whose class

Cant get the tag 'rel' via beautifulsoup webscrapping python

阅读更多关于 Cant get the tag 'rel' via beautifulsoup webscrapping python

问题 I am trying to test a beautifulsoup4 webscrap code on a website. Have done most of it but one attribute information due to its location is little tricky for me to accomplish. Code goes like this: span class="callseller-description-icon"> <a id="phone-lead" class="callseller-description-link" rel="0501365082" href="#">Show Phone Number</a> I am trying this but not sure if its okay try: phone=soup.find('a',{'id':'phone-lead'}) for a in phone: phone_result= str(a.get_text('rel').strip().encode(

Beautiful soup just extract header of a table

阅读更多关于 Beautiful soup just extract header of a table

问题 I want to extract information from the table in the following website using beautiful soup in python 3.5. http://www.askapatient.com/viewrating.asp?drug=19839&name=ZOLOFT I have to save the web-page first, since my program needs to work off-line. I saved the web-page in my computer and I used the following codes to extract table information. But the problem is that the code just extract heading of the table. This is my code: from urllib.request import Request, urlopen from bs4 import

Using find_all in BS4 to get text as a list

阅读更多关于 Using find_all in BS4 to get text as a list

问题 I'll start by saying I'm very new with Python. I've been building a Discord bot with discord.py and Beautiful Soup 4. Here's where I'm at: @commands.command(hidden=True) async def roster(self): """Gets a list of CD's members""" url = "http://www.clandestine.pw/roster.html" async with aiohttp.get(url) as response: soupObject = BeautifulSoup(await response.text(), "html.parser") try: text = soupObject.find_all("font", attrs={'size': '4'}) await self.bot.say(text) except: await self.bot.say("Not

How to extract link from <a> inside the <h2 class=section-heading>:BeautifulSoup [duplicate]

阅读更多关于 How to extract link from inside the :BeautifulSoup [duplicate]

问题 This question already has an answer here : BeautifulSoup getting href [duplicate] (1 answer) Closed 3 years ago . I am trying to extract a link which is written like this: <h2 class="section-heading"> <a href="http://www.nytimes.com/pages/arts/index.html">Arts »</a> </h2> my code is: from bs4 import BeautifulSoup import requests, re def get_data(): url='http://www.nytimes.com/' s_code=requests.get(url) plain_text = s_code.text soup = BeautifulSoup(plain_text) head_links=soup.findAll('h2', {