Using C# HttpClient to login on a website and scrape information from another page

后端 未结 2 705
天涯浪人
天涯浪人 2021-02-04 22:34

I am trying to use C# and Chrome Web Inspector to login on http://www.morningstar.com and retrieve some information on the page http://financials.morningstar.com/income-statemen

2条回答
  •  我寻月下人不归
    2021-02-04 23:15

    You should simulate login process of the web site. The easiest way of this is inspecting website via some debugger (for example Fiddler).

    Here is login request of the web site:

    POST https://members.morningstar.com/memberservice/login.aspx?CustId=&CType=&CName=&RememberMe=true&CookieTime= HTTP/1.1
    Accept: text/html, application/xhtml+xml, */*
    Referer: https://members.morningstar.com/memberservice/login.aspx
    ** omitted **
    Cookie: cookies=true; TestCookieExist=Exist; fp=001140581745182496; __utma=172984700.91600904.1405817457.1405817457.1405817457.1; __utmb=172984700.8.10.1405817457; __utmz=172984700.1405817457.1.1.utmcsr=(direct)|utmccn=(direct)|utmcmd=(none); __utmc=172984700; ASP.NET_SessionId=b5bpepm3pftgoz55to3ql4me
    
    email_textbox=test@email.com&pwd_textbox=password&remember=on&email_textbox2=&go_button.x=36&go_button.y=16&__LASTFOCUS=&__EVENTTARGET=&__EVENTARGUMENT=&__VIEWSTATE=omitted&__EVENTVALIDATION=omited
    

    When you inspect this, you'll see some cookies and form fields like "__VIEWSTATE". You'll need the actual values of this filed to log in. You can use following steps:

    1. Make a request and scrap fields like "__LASTFOCUS", "__EVENTTARGET", "__EVENTARGUMENT", "__VIEWSTATE", "__EVENTVALIDATION"; and cookies.
    2. Create a new POST request to the same page, use CookieContainer from previous one; build a post string using scrapped fields, username and password. Post it with MIME type application/x-www-form-urlencoded.
    3. If successful use the cookies for further requests to stay logged in.

    Note: You can use htmlagilitypack, or scrapysharp to scrap html. ScrapySharp provide easy to use tools for form posting forms and browsing websites.

提交回复
热议问题