Grabbing text from a webpage

穿精又带淫゛_ 提交于 2019-11-29 23:41:15

问题


I would like to write a program that will find bus stop times and update my personal webpage accordingly.

If I were to do this manually I would

  1. Visit www.calgarytransit.com
  2. Enter a stop number. ie) 9510
  3. Click the button "next bus"

The results may look like the following:

10:16p Route 154
10:46p Route 154
11:32p Route 154

Once I've grabbed the time and routes then I will update my webpage accordingly.

I have no idea where to start. I know diddly squat about web programming but can write some C and Python. What are some topics/libraries I could look into?


回答1:


Beautiful Soup is a Python library designed for parsing web pages. Between it and urllib2 (urllib.request in Python 3) you should be able to figure out what you need.




回答2:


What you're asking about is called "web scraping." I'm sure if you google around you'll find some stuff, but the core notion is that you want to open a connection to the website, slurp in the HTML, parse it and identify the chunks you want.

The Python Wiki has a good lot of stuff on this.




回答3:


Since you write in C, you may want to check out cURL; in particular, take a look at libcurl. It's great.




回答4:


You can use the mechanize library that is available for Python http://wwwsearch.sourceforge.net/mechanize/




回答5:


You can use Perl to help you complete your task.

use strict;
use LWP;

my $browser = LWP::UserAgent->new;

my $responce = $browser->get("http://google.com");
print $responce->content;

Your responce object can tell you if it suceeded as well as returning the content of the page.You can also use this same library to post to a page.

Here is some documentation. http://metacpan.org/pod/LWP::UserAgent




回答6:


That site doesnt offer an API for you to be able to get the appropriate data that you need. In that case you'll need to parse the actual HTML page returned by, for example, a CURL request .




回答7:


This is called Web scraping, and it even has its own Wikipedia article where you can find more information.

Also, you might find more details in this SO discussion.




回答8:


As long as the layout of the web page your trying to 'scrape' doesnt regularly change, you should be able to parse the html with any modern day programming language.



来源:https://stackoverflow.com/questions/419260/grabbing-text-from-a-webpage

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!