问题
I've been trying to login to Instagram using the Requests library but I can't get it to work. The connection always get refused.
import requests
#Creating URL, usr/pass and user agent variables
BASE_URL = 'https://www.instagram.com/'
LOGIN_URL = BASE_URL + 'accounts/login/ajax/'
USERNAME = '******'
PASSWD = '******'
USER_AGENT = 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko)\
Chrome/59.0.3071.115 Safari/537.36'
#Setting some headers and refers
session = requests.Session()
session.headers = {'user-agent': USER_AGENT}
session.headers.update({'Referer': BASE_URL})
try:
#Requesting the base url. Grabbing and inserting the csrftoken
req = session.get(BASE_URL)
session.headers.update({'X-CSRFToken': req.cookies['csrftoken']})
login_data = {'username': USERNAME, 'password': PASSWD}
#Finally login in
login = session.post(LOGIN_URL, data=login_data, allow_redirects=True)
session.headers.update({'X-CSRFToken': login.cookies['csrftoken']})
cookies = login.cookies
#Print the html results after I've logged in
print(login.text)
#In case of refused connection
except requests.exceptions.ConnectionError:
print("Connection refused")
I don't know what I'm doing wrong. I would really appreciate if anyone posted any solutions. Please do not suggest API or Selenium(They're not an option for me at the moment)
回答1:
Since requests doesn't execute JavaScript's you don't have the CSRFToken in your cookies.
If you have a look at the content you can find the csrf_token inside the html.
Using bs4 and json you can extract it and use it in your post.
from bs4 import BeautifulSoup
import json, random, re, requests
BASE_URL = 'https://www.instagram.com/accounts/login/'
LOGIN_URL = BASE_URL + 'ajax/'
headers_list = [
"Mozilla/5.0 (Windows NT 5.1; rv:41.0) Gecko/20100101"\
" Firefox/41.0",
"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_2)"\
" AppleWebKit/601.3.9 (KHTML, like Gecko) Version/9.0.2"\
" Safari/601.3.9",
"Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:15.0)"\
" Gecko/20100101 Firefox/15.0.1",
"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36"\
" (KHTML, like Gecko) Chrome/42.0.2311.135 Safari/537.36"\
" Edge/12.246"
]
USERNAME = '****'
PASSWD = '*****'
USER_AGENT = headers_list[random.randrange(0,4)]
session = requests.Session()
session.headers = {'user-agent': USER_AGENT}
session.headers.update({'Referer': BASE_URL})
req = session.get(BASE_URL)
soup = BeautifulSoup(req.content, 'html.parser')
body = soup.find('body')
pattern = re.compile('window._sharedData')
script = body.find("script", text=pattern)
script = script.get_text().replace('window._sharedData = ', '')[:-1]
data = json.loads(script)
csrf = data['config'].get('csrf_token')
login_data = {'username': USERNAME, 'password': PASSWD}
session.headers.update({'X-CSRFToken': csrf})
login = session.post(LOGIN_URL, data=login_data, allow_redirects=True)
login.content
# b'{"authenticated": true, "user": true, "userId": "*******", "oneTapPrompt": false, "status": "ok"}'
Have in mind that most of the data in instagram it's loaded with javascript, so you may have more troubles in future.
You can refer to this post on how to recover data : https://stackoverflow.com/a/49831347
Or you can use different library like dryscrape or spynner
来源:https://stackoverflow.com/questions/50316885/i-cant-login-to-instagram-with-requests