beautifulsoup | 易学教程

Scraper in Python gives “Access Denied”

阅读更多关于 Scraper in Python gives “Access Denied”

问题 I'm trying to code a scraper in Python to get some info from a page. Like the title of the offers that appear on this page: https://www.justdial.com/Panipat/Saree-Retailers/nct-10420585 By now I use this code : import bs4 import requests def extract_source(url): source=requests.get(url).text return source def extract_data(source): soup=bs4.BeautifulSoup(source) names=soup.findAll('title') for i in names: print i extract_data(extract_source('https://www.justdial.com/Panipat/Saree-Retailers/nct

Scraper in Python gives “Access Denied”

阅读更多关于 Scraper in Python gives “Access Denied”

The Parsing of HTML files at the same directory in the Python

阅读更多关于 The Parsing of HTML files at the same directory in the Python

问题 I have designed the code parsing HTML files: from bs4 import BeautifulSoup import re import os from os.path import join for (dirname, dirs, files) in os.walk('.'): for filename in files: if filename.endswith('.html'): thefile = os.path.join(dirname, filename) with open(thefile, 'r') as f: contents = f.read() soup = BeautifulSoup(contents, 'lxml') Initialtext = soup.get_text() MediumText = Initialtext.lower().split() clean_tokens = [t for t in text2 if re.match(r'[^\W\d]*$', t)]

Can't get data in table form using Selenium Python

阅读更多关于 Can't get data in table form using Selenium Python

问题 Am new to scrapping using selenium python. So i could retrieve some of the data, but i want it in table form as is displayed on the web page: Here is what i have so far: url='https://definitivehc.maps.arcgis.com/home/item.html?id=1044bb19da8d4dbfb6a96eb1b4ebf629&view=list&showFilters=false#data' browser = webdriver.Chrome(r"C:\task\chromedriver") browser.get(url) time.sleep(25) rows_in_table = browser.find_elements_by_xpath('//table[@class="dgrid-row-table"]//tr[th or td]') for element in

Can't get data in table form using Selenium Python

阅读更多关于 Can't get data in table form using Selenium Python

Gibberish text output because of encoding in web scraping

阅读更多关于 Gibberish text output because of encoding in web scraping

问题 I'm trying to get a text in Persian language from Google Translate, and the best encoding type for Persian is UTF-8. Google Translate uses Javascript to render its HTML codes, so I'm using html-requests module for this. What I have problem with is the output that I get each time, both either when I use print() or when I try to write it into a file. Both ways will give me a gibberish non-Persian text, and I know it's because of the encoding or something like this. So I was trying to change

How to extract the text in the textarea frame of the DeepL page?

阅读更多关于 How to extract the text in the textarea frame of the DeepL page?

问题 From https://www.deepl.com/translator#en/fr/Hello%2C%20how%20are%20you%20today%3F We see this: But in code, the translated text "Bonjour, comment allez-vous aujourd'hui?" doesn't appear in any place of the page's source and the frame's code looks like: <textarea class="lmt__textarea lmt__target_textarea lmt__textarea_base_style" data-gramm_editor="false" tabindex="110" dl-test="translator-target-input" lang="fr-FR" style="height: 300px;"></textarea> And no matter how I read the text or source

Max retries exceeded with URL Selenium [duplicate]

阅读更多关于 Max retries exceeded with URL Selenium [duplicate]

问题 This question already has answers here : MaxRetryError: HTTPConnectionPool: Max retries exceeded (Caused by ProtocolError('Connection aborted.', error(111, 'Connection refused'))) (2 answers) Closed 8 months ago . So i'm looking to traverse a URL array and open different URL's for web scraping with Selenium. The problem is, as soon as I hit the second browser.get(url), I get a 'Max retries exceeded with URL' and 'No connection could be made because the target machine actively refused it'.

Web Scraping Extract Javascript Table Selenium+Python

阅读更多关于 Web Scraping Extract Javascript Table Selenium+Python

问题 I've read several articles of Web Scraping with but I didn't undestand how to find the elements in the site. The site I want to scrap the table is below: http://www.bmfbovespa.com.br/pt_br/servicos/market-data/cotacoes/mercado-de-derivativos/?symbol=DI1 I want to scrap the tables: "TB01, "TB02, TB03 and TB04" theses are the ids of the tables <tbody> == $0 <tr> <td id="TB01">...</td> <td id="TB02">...</td> <td id="TB03">...</td> <td id="TB04">...</td> <tr> I've tried all the find.element

Web Scraping Extract Javascript Table Selenium+Python

阅读更多关于 Web Scraping Extract Javascript Table Selenium+Python