bs4

bs4.FeatureNotFound: Couldn't find a tree builder with the features you requested: lxml [duplicate]

☆樱花仙子☆ 提交于 2019-12-12 03:42:32
问题 This question already has answers here : beautifulsoup won't recognize lxml (2 answers) Closed 3 years ago . Can you please suggest a fix? It almost download all the images from imgur pages with one single image not sure why it is not working in this case and how to fix it? elif 'imgur.com' in submission.url and not (submission.url.endswith('gif') or submission.url.endswith('webm') or submission.url.endswith('mp4') or 'all' in submission.url or '#' in submission.url or '/a/' in submission.url

bytes object has no attribute find_all

你说的曾经没有我的故事 提交于 2019-12-12 03:24:50
问题 I've been trying for the last 3 hours to scrape this website and get the rank, name, wins, and losses of each team. When implementing this code: import requests from bs4 import BeautifulSoup halo = requests.get("https://www.halowaypoint.com/en-us/esports/standings") page = BeautifulSoup(halo.content, "html.parser") final = page.encode('utf-8') print(final.find_all("div")) I keep getting this error If anyone can help me out then it would be much appreciated! Thanks! 回答1: You are calling the

How to input values and click button with Requests?

寵の児 提交于 2019-12-12 01:54:21
问题 With the requests module i eventually want to download a song. if you head to youtube-mp3.org, there is one input bar and one convert button. Shortly after the convert is finished there is a download button. Now i want to go throught the process with my python script. so far i have this: def download_song(song_name): import requests with requests.Session() as c: url = r"http://www.youtube-mp3.org/" c.get(url) it barely anything... i have tried to check the documentation on there website. i

Get authors name and URL for tag from google scholar

。_饼干妹妹 提交于 2019-12-12 01:36:13
问题 I wish to write to a CSV file a list of all authors with their URL to a CSV file who class themselves as a specific tag on Google Scholar. For example, if we were to take 'security' I would want this output: author url Howon Kim https://scholar.google.pl/citations?user=YUoJP-oAAAAJ&hl=pl Adrian Perrig https://scholar.google.pl/citations?user=n-Oret4AAAAJ&hl=pl ... ... I have written this code which prints each author's name # -*- coding: utf-8 -*- import urllib.request import csv from bs4

Using BeautifulSoup4 with Google Translate

为君一笑 提交于 2019-12-11 13:46:19
问题 I am currently going through the Web Scraping section of AutomateTheBoringStuff and trying to write a script that extracts translated words from Google Translate using BeautifulSoup4. I inspected the html content of a page where 'Explanation' is the translated word: <span id="result_box" class="short_text" lang="en"> <span class>Explanation</span> </span> Using BeautifulSoup4, I tried different selectors but nothing would return the translated word. Here are a few examples I tried, but they

How do I avoid data from different tabs to be concatenated in one cell when I scrape a table?

拟墨画扇 提交于 2019-12-11 05:55:25
问题 I scraped this page https://www.capfriendly.com/teams/bruins, specifically looking for the tables under the tab Cap Hit (Fowards, Defense, GoalTenders). I used Python and BeautifulSoup4 and CSV as the output format. import requests, bs4 r = requests.get('https://www.capfriendly.com/teams/bruins') soup = bs4.BeautifulSoup(r.text, 'lxml') table = soup.find(id="team") with open("csvfile.csv", "w", newline='') as team_data: for tr in table('tr', class_=['odd', 'even']): # get all tr whose class

Cant get the tag 'rel' via beautifulsoup webscrapping python

假如想象 提交于 2019-12-11 05:46:28
问题 I am trying to test a beautifulsoup4 webscrap code on a website. Have done most of it but one attribute information due to its location is little tricky for me to accomplish. Code goes like this: span class="callseller-description-icon"> <a id="phone-lead" class="callseller-description-link" rel="0501365082" href="#">Show Phone Number</a> I am trying this but not sure if its okay try: phone=soup.find('a',{'id':'phone-lead'}) for a in phone: phone_result= str(a.get_text('rel').strip().encode(

Beautiful soup just extract header of a table

北慕城南 提交于 2019-12-11 05:37:20
问题 I want to extract information from the table in the following website using beautiful soup in python 3.5. http://www.askapatient.com/viewrating.asp?drug=19839&name=ZOLOFT I have to save the web-page first, since my program needs to work off-line. I saved the web-page in my computer and I used the following codes to extract table information. But the problem is that the code just extract heading of the table. This is my code: from urllib.request import Request, urlopen from bs4 import

Using find_all in BS4 to get text as a list

拟墨画扇 提交于 2019-12-11 04:38:06
问题 I'll start by saying I'm very new with Python. I've been building a Discord bot with discord.py and Beautiful Soup 4. Here's where I'm at: @commands.command(hidden=True) async def roster(self): """Gets a list of CD's members""" url = "http://www.clandestine.pw/roster.html" async with aiohttp.get(url) as response: soupObject = BeautifulSoup(await response.text(), "html.parser") try: text = soupObject.find_all("font", attrs={'size': '4'}) await self.bot.say(text) except: await self.bot.say("Not

How to extract link from <a> inside the <h2 class=section-heading>:BeautifulSoup [duplicate]

做~自己de王妃 提交于 2019-12-11 01:08:46
问题 This question already has an answer here : BeautifulSoup getting href [duplicate] (1 answer) Closed 3 years ago . I am trying to extract a link which is written like this: <h2 class="section-heading"> <a href="http://www.nytimes.com/pages/arts/index.html">Arts »</a> </h2> my code is: from bs4 import BeautifulSoup import requests, re def get_data(): url='http://www.nytimes.com/' s_code=requests.get(url) plain_text = s_code.text soup = BeautifulSoup(plain_text) head_links=soup.findAll('h2', {