Fetch a Wikipedia article with Python

前端 未结 10 1888
余生分开走
余生分开走 2020-11-27 15:37

I try to fetch a Wikipedia article with Python\'s urllib:

f = urllib.urlopen(\"http://en.wikipedia.org/w/index.php?title=Albert_Einstein&printable=yes\")         


        
10条回答
  •  粉色の甜心
    2020-11-27 16:17

    Requesting the page with ?printable=yes gives you an entire relatively clean HTML document. ?action=render gives you just the body HTML. Requesting to parse the page through the MediaWiki action API with action=parse likewise gives you just the body HTML but would be good if you want finer control, see parse API help.

    If you just want the page HTML so you can render it, it's faster and better is to use the new RESTBase API, which returns a cached HTML representation of the page. In this case, https://en.wikipedia.org/api/rest_v1/page/html/Albert_Einstein.

    As of November 2015, you don't have to set your user-agent, but it's strongly encouraged. Also, nearly all Wikimedia wikis require HTTPS, so avoid a 301 redirect and make https requests.

提交回复
热议问题