Python: Login to ajax website using request

后端 未结 2 1047
小蘑菇
小蘑菇 2021-01-29 05:18

I am trying to connect a website which seems to be in Ajax. The html page I want to get has the same URL as the landing page, it just changes once you login. Here\'s my code :

相关标签:
2条回答
  • 2021-01-29 05:40

    You're posting your login with session.post but then trying to read the logged in page with urllib. urllib doesn't have any information about your login data (session cookie, for example), unless you explicitly provide it. When you post, you're not capturing the response. Even if you didn't require it, continue to use the session to request the login page again.

    response = s.post(URL, data=payload)
    # response holds the HTTP status, cookie data and possibly the "logged in page" html.
    # check `response.text` if that's the case. if it's only the authentication cookie...
    logged_in_page = s.get(URL)
    

    When you do s.get() using the same session, the cookies you got when logging in are re-sent for subsequent requests. Since it's AJAX, you need to check what additional data, headers or cookies are being sent when done via browser (and whether it's get or post to retrieve subsequent pages.)

    For the login post() login data may be sent as params, posted data or headers. Check which one is happening in your browser (using the dev tools --> "Network" in Firefox or Chrome).

    Also, don't use the with context with sessions because it will end the session as soon as you exit that code block. You probably want your session s to last longer than just logging in, since it's managing your cookies, etc.

    0 讨论(0)
  • 2021-01-29 05:47

    It doesn't look like you sent the actual login request. Try something like:

    URL = 'http://www.pogdesign.co.uk/cat/'
    LOGIN_URL = 'http://www.pogdesign.co.uk/login/' # Or whatever the login request url is
    payload = {' password': 'password', ' sub_login': 'Account Login', 'username': 'email'}
    
    s = requests.Session()
    s.post(LOGIN_URL, data=payload)
    s.get(URL)
    s.content
    # >> your /cat/ content
    

    The nice thing about Session is that it carries your cookies for you by default so once a session is authenticated it will continue working. I have an example at https://github.com/BWStearns/WhiteTruffleScraper which uses a session login.

    You can find the login request URL by watching the traffic in developer tools and logging in.

    0 讨论(0)
提交回复
热议问题