Using C# HttpClient to login on a website and scrape information from another page

非 Y 不嫁゛ 提交于 2019-12-03 00:51:53

You should simulate login process of the web site. The easiest way of this is inspecting website via some debugger (for example Fiddler).

Here is login request of the web site:

POST https://members.morningstar.com/memberservice/login.aspx?CustId=&CType=&CName=&RememberMe=true&CookieTime= HTTP/1.1
Accept: text/html, application/xhtml+xml, */*
Referer: https://members.morningstar.com/memberservice/login.aspx
** omitted **
Cookie: cookies=true; TestCookieExist=Exist; fp=001140581745182496; __utma=172984700.91600904.1405817457.1405817457.1405817457.1; __utmb=172984700.8.10.1405817457; __utmz=172984700.1405817457.1.1.utmcsr=(direct)|utmccn=(direct)|utmcmd=(none); __utmc=172984700; ASP.NET_SessionId=b5bpepm3pftgoz55to3ql4me

email_textbox=test@email.com&pwd_textbox=password&remember=on&email_textbox2=&go_button.x=36&go_button.y=16&__LASTFOCUS=&__EVENTTARGET=&__EVENTARGUMENT=&__VIEWSTATE=omitted&__EVENTVALIDATION=omited

When you inspect this, you'll see some cookies and form fields like "__VIEWSTATE". You'll need the actual values of this filed to log in. You can use following steps:

  1. Make a request and scrap fields like "__LASTFOCUS", "__EVENTTARGET", "__EVENTARGUMENT", "__VIEWSTATE", "__EVENTVALIDATION"; and cookies.
  2. Create a new POST request to the same page, use CookieContainer from previous one; build a post string using scrapped fields, username and password. Post it with MIME type application/x-www-form-urlencoded.
  3. If successful use the cookies for further requests to stay logged in.

Note: You can use htmlagilitypack, or scrapysharp to scrap html. ScrapySharp provide easy to use tools for form posting forms and browsing websites.

the mental is process is simulate a person login in the website, some logins are made using AJAX or traditional POST request, so, the first thing you need to do, is made that request like browser does, in the server response, you will get cookies, headers and other information, you need to use that info to build a new request, this are the scrappy request.

Steps are:

1) Build a request, like browser does, to authenticate yourself to the app. 2) Inspect response, and saves headers, cookies or other useful info to persisting your session with the server. 3) Make another request to server, using the info you gathered from second step. 4) Inspect response, and use data analysis algorithm or something else to extract the data.

Tips:

You are not using here javascript engine, some websites use javascript to show graphs, or execute some interation in the DOM document. In that cases, maybe you will need to use WebKit lib wrapper.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!