I want to be able to log into a website programmatically and periodically obtain some information from the site. What is the best tool(s) that would make this as simple as possible? I'd prefer a Python library of some type because I want to become more proficient in Python, but I'm open to any suggestions.
You can try Mechanize (http://wwwsearch.sourceforge.net/mechanize/) for programmatic web-browsing, and definitely use Beautiful Soup (http://www.crummy.com/software/BeautifulSoup/) for the scraping.
Most of us use urllib2 to get the page; it can handle various forms of authentication and cookie collection. Then Beautiful Soup to parse the results.
I once wrote a Python script to automatically log into vBulletin forums. The difficult part was knowing how to correctly form the login request and that is something that a library won't help you with. I found Live Http Headers - an addon for Firefox - to be pretty helpful in seeing what is sent between the client and server during the login process.
I also agree with everyone else that Beautiful Soup is pretty awesome.
i recommend using twill it makes it a snap to do the login procedure. then use beautifulsoup etc. as described above. ive never tried mechanize, but it looks pretty good.
just for screen scraping you can use combination of url lib + pyqyery. https://pythonhosted.org/pyquery/
来源:https://stackoverflow.com/questions/832673/what-is-the-best-way-to-programmatically-log-into-a-web-site-in-order-to-screen