web-scraping | 易学教程

Scraping string from a large number of URLs with Julia

阅读更多关于 Scraping string from a large number of URLs with Julia

问题 Happy New Year! I have just started to learn Julia and my first mini challenge I have set myself is to scrape data from a large list of URLs. I have ca 50k URLs (which I successfully parsed from a JSON with Julia using Regex) in a CSV file. I want to scrape each one and return a matched string ("/page/12345/view" - where 12345 is any integer). I managed to do so using HTTP and Queryverse (although had started with CSV and CSVFiles but looking at packages for learning purposes) but the script

Python's requests triggers Cloudflare's security while urllib does not

阅读更多关于 Python's requests triggers Cloudflare's security while urllib does not

问题 I'm working on an automated webscrapper for a Restaurant website, but I'm having an issue. The said website uses cloudlfare's anti-bot security, which I would like to bypass, not the Under-Attack-Mode but a captcha test that only triggers when it detects a non-American IP or a bot. I'm trying to bypass it as cloudflare's security doesn't trigger when I clear cookies, disable javascript or when I use an American proxy. Knowing this, I tried using python's requests library as such: import

How to scrape json data from an interactive chart?

阅读更多关于 How to scrape json data from an interactive chart?

问题 I have a specific section of a website that I want to scrape data from and here's the screenshot of the section - I inspected the elements of that particular section and noticed that it's within a canvas tag. However, I also checked the source code of the website and I found that the data lies within the source code in a format I'm not familiar with. Here's a sample of that data JSON.parse('\x5B\x7B\x22id\x22\x3A\x2232522\x22,\x22minute\x22\x3A\x2222\x22,\x22result\x22\x3A\x22MissedShots\x22,

How to scrape json data from an interactive chart?

阅读更多关于 How to scrape json data from an interactive chart?

Using selenium to retrieve data from webpage - not retrieving all data

阅读更多关于 Using selenium to retrieve data from webpage - not retrieving all data

问题 I am trying to retrieve data (coin name, price, coinmarket cap and circulating supply) from coinmarketcap.com, but when I run the code below I only get 11 coin names. Plus, I am not able to retrieve other data. I am tried several options, but none successful. My goal is to store the data in a dataframe, so I can analyze it. driver = webdriver.Chrome(r'C:\Users\Ejer\PycharmProjects\pythonProject\chromedriver') driver.get('https://coinmarketcap.com/') Crypto = driver.find_elements_by_xpath("/

How can I bypass a cookie agreement page while web scraping using Python?

阅读更多关于 How can I bypass a cookie agreement page while web scraping using Python?

问题 I hurt my nose to a cookie agreement page... What I am doing: import requests url = "https://stockhouse.com/community/bullboards/" r = requests.get(url) soup = BeautifulSoup(r.content, "html.parser") print(soup) which returns HTML from a cookie agreement page. What I am then looking for is to bypass this page and scrape the content of the actual page once we accept the cookies... I tried the code from this question: cookies = dict(BCPermissionLevel='PERSONAL') html = requests.get(website,

'NoneType' object has no attribute 'text' in BeautifulSoup

阅读更多关于 'NoneType' object has no attribute 'text' in BeautifulSoup

问题 I am trying to scrape Google results when I search " What is 2+2 ", but the following code is returning 'NoneType' object has no attribute 'text' . Please help me in achieving the required goal. text="What is 2+2" search=text.replace(" ","+") link="https://www.google.com/search?q="+search headers={'User-Agent':'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/51.0.2704.103 Safari/537.36'} source=requests.get(link,headers=headers).text soup=BeautifulSoup(source,

'NoneType' object has no attribute 'text' in BeautifulSoup

阅读更多关于 'NoneType' object has no attribute 'text' in BeautifulSoup

'NoneType' object has no attribute 'text' in BeautifulSoup

阅读更多关于 'NoneType' object has no attribute 'text' in BeautifulSoup

'NoneType' object has no attribute 'text' in BeautifulSoup

阅读更多关于 'NoneType' object has no attribute 'text' in BeautifulSoup