Scraping JSON from AJAX calls

你。 提交于 2021-01-29 10:00:49

问题


Background

Considering this url:

base_url = "https://www.olx.bg/ad/sobstvenik-tristaen-kamenitsa-1-CID368-ID81i3H.html"

I want to make the ajax call for the telephone number:

ajax_url = "https://www.olx.bg/ajax/misc/contact/phone/7XarI/?pt=e3375d9a134f05bbef9e4ad4f2f6d2f3ad704a55f7955c8e3193a1acde6ca02197caf76ffb56977ce61976790a940332147d11808f5f8d9271015c318a9ae729"

Wanted results

If I press the button through the site in my chrome browser in the console I would get the wanted result:

{"value":"088 *****"}

debugging

If I open a new tab and paste the ajax_url I would always get empty values:

{"value":"000 000 000"}

If I try something like:

Bash:

wget $ajax_url

Python:

import requests


json_response= requests.get(ajax_url)

I would just receive the html of the the site's handling page that there is an error.

Ideas

I have something more when I am opening the request with the browser. What more do I have? maybe a cookie?

How do I get the wanted result with Bash/Python ?

Edit

the code of the response html is 200

I have tried with curl I get the same html problem.

Kind of a fix.

I have noticed that if I copy the cookie of the browser, and make a request with all the headers INCLUDING the cookie from the browser, I get the correct result

# I think the most important header is the cookie
headers = DICT_WITH_HEADERS_FROM_BROWSER
json_response= requests.get(next_url,
                            headers=headers,
                            )

Final question

The only question left is how can I generate a cookie through a Python script?


回答1:


First you should create a requests Session to store cookies. Then send a http GET request to the page that is actually calling the ajax request. If any cookie is created by the website, it is sent in GET response and your sessions stores the cookie. Then you can easily use the session to call ajax api.

Important Note 1: The ajax url you are calling in the original website is a http POST request! you should not send a get request to that url.

Important Note 2: You also must extract phoneToken from the website js code which is stored in a variable like var phoneToken = 'here is the pt';

Sample code:

import re
import requests

my_session = requests.Session()

# call html website
base_url = "https://www.olx.bg/ad/sobstvenik-tristaen-kamenitsa-1-CID368-ID81i3H.html"
base_response = my_session.get(url=base_url)
assert base_response.status_code == 200

# extract phone token from base url response
phone_token = re.findall(r'phoneToken\s=\s\'(.+)\';', base_response.text)[0]

# call ajax api
ajax_path = "/ajax/misc/contact/phone/81i3H/?pt=" + phone_token
ajax_url = "https://www.olx.bg" + ajax_path
ajax_headers = {
    'accept': '*/*',
    'accept-encoding': 'gzip, deflate, br',
    'accept-language': 'en-US,en;q=0.9,fa;q=0.8',
    'sec-fetch-mode': 'cors',
    'sec-fetch-site': 'same-origin',
    'Referer': 'https://www.olx.bg/ad/sobstvenik-tristaen-kamenitsa-1-CID368-ID81i3H.html',
    'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/76.0.3809.100 Safari/537.36'
}
ajax_response = my_session.post(url=ajax_url, headers=ajax_headers)

print(ajax_response.text)

When you run the code above, the result below is displayed:

{"value":"088 558 9937"}



回答2:


from selenium import webdriver
from bs4 import BeautifulSoup
from selenium.webdriver.firefox.options import Options
from bs4 import BeautifulSoup
import time

options = Options()
options.add_argument('--headless')

driver = webdriver.Firefox(options=options)
driver.get(
    'https://www.olx.bg/ad/sobstvenik-tristaen-kamenitsa-1-CID368-ID81i3H.html')

number = driver.find_element_by_xpath(
    "/html/body/div[3]/section/div[3]/div/div[1]/div[2]/div/ul[1]/li[2]/div/strong").click()
time.sleep(2)
source = driver.page_source
soup = BeautifulSoup(source, 'html.parser')

phone = soup.find("strong", {'class': 'xx-large'}).text

print(phone)

Output:

088 558 9937


来源:https://stackoverflow.com/questions/59487349/scraping-json-from-ajax-calls

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!