I am trying to scrape the following website : https://angel.co/companies
There is a \"More\" button at the bottom, which on click loads more records.
I need
You are doing right, just wait a little. The ajax triggers after selenium callback. You can write something like this or use "assert":
button = None
while not button:
button = driver.find_element_by_class_name("more")
if button:
break
You also can try to use ajaxes instead of selenium. Try this url changing page parameter:
https://angel.co/companies/startups?ids[]=81494&ids[]=3322647&ids[]=98145&ids[]=32119&ids[]=21604&ids[]=19935&ids[]=480579&ids[]=3062473&ids[]=431924&ids[]=395542&ids[]=154&ids[]=948481&ids[]=197974&ids[]=891681&ids[]=972236&ids[]=686564&ids[]=115616&ids[]=515341&ids[]=1856&ids[]=477880&total=4381226&page=3&sort=signal&new=false&hexdigest=be1927797c1b88f79ae42efd4180ea78d3e9e711
Look, the website returns json file with dictionary of one key - "html", this is the htmlcode servers return.
I have tried the same using Java. Please add explicit/fluent wait before checking the list size. Please find below the code.
driver.get("https://angel.co/companies");
new WebDriverWait(driver, 30).pollingEvery(Duration.ofMillis(100)).withTimeout(Duration.ofSeconds(30))
.until(ExpectedConditions.elementToBeClickable(By.cssSelector("div.more")));
List<WebElement> elements = driver.findElements(By.cssSelector("div.more"));
System.out.println(elements.size());
The more you will click the MORE button more data will be loaded. You need to induce WebDriverWait for the button with text with MORE to be clickable and you can use the following solution:
Code Block:
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.common.exceptions import TimeoutException
options = webdriver.ChromeOptions()
options.add_argument("start-maximized")
options.add_argument('disable-infobars')
driver=webdriver.Chrome(chrome_options=options, executable_path=r'C:\WebDrivers\chromedriver.exe')
driver.get("https://angel.co/companies")
WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//div[@class='more' and contains(.,'More')]")))
while True:
try:
WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.XPATH, "//div[@class='more' and contains(.,'More')]"))).click()
print("MORE button clicked")
except TimeoutException:
break
driver.quit()
Console Output:
MORE button clicked
MORE button clicked
MORE button clicked
MORE button clicked
MORE button clicked
MORE button clicked
MORE button clicked
MORE button clicked
MORE button clicked
MORE button clicked
MORE button clicked
MORE button clicked
MORE button clicked
MORE button clicked
MORE button clicked
MORE button clicked
MORE button clicked
MORE button clicked
MORE button clicked
MORE button clicked