wikipedia

How to get all article pages under a Wikipedia Category and its sub-categories?

|▌冷眼眸甩不掉的悲伤 提交于 2019-11-27 02:07:54
问题 I want to get all the articles names under a category and its sub-categories. Options I'm aware of: Using the Wikipedia API. Does it have such an option?? d/l the dump. Which format would be better for my usage? There is also an option to search in Wikipedia something like incategory:"music" , but I didn't see an option to view that in XML. Please share your thoughts 回答1: The following resource will help you to download all pages from the category and all its subcategories: http://en

How to get the Infobox data from Wikipedia?

江枫思渺然 提交于 2019-11-27 01:44:59
If I have the url to a page, how would I obtain the Infobox information on the right using MediaWiki webservices? Maybe a little late but i wanted the same thing and didn't see any easy solutions here, but (as Bryan points out) it turns out not to be too difficult to use the Mediawiki API with this library: https://github.com/siznax/wptools Usage: >>> import wptools >>> so = wptools.page('Stack Overflow').get_parse() >>> so.infobox {'alexa': '{{DecreasePositive}}', 'author': '[[Joel Spolsky]] and [[Jeff Atwood]]', 'caption': 'Screenshot of Stack Overflow as of February 2015', 'commercial':

Fetch excerpt from Wikipedia article?

亡梦爱人 提交于 2019-11-27 01:40:39
问题 I've been up and down the Wikipedia API, but I can't figure out if there's a nice way to fetch the excerpt of an article (usually the first paragraph). It would be nice to get the HTML formatting of that paragraph, too. The only way I currently see of getting something that resembles a snippet is by performing a fulltext search (example), but that's not really what I want (too short). Is there any other way to fetch the first paragraph of a Wikipedia article than barbarically parsing HTML

Fetch a Wikipedia article with Python

不打扰是莪最后的温柔 提交于 2019-11-27 00:50:56
I try to fetch a Wikipedia article with Python's urllib: f = urllib.urlopen("http://en.wikipedia.org/w/index.php?title=Albert_Einstein&printable=yes") s = f.read() f.close() However instead of the html page I get the following response: Error - Wikimedia Foundation: Request: GET http://en.wikipedia.org/w/index.php?title=Albert_Einstein&printable=yes, from 192.35.17.11 via knsq1.knams.wikimedia.org (squid/2.6.STABLE21) to () Error: ERR_ACCESS_DENIED, errno [No Error] at Tue, 23 Sep 2008 09:09:08 GMT Wikipedia seems to block request which are not from a standard browser. Anybody know how to work

Extract the first paragraph from a Wikipedia article (Python)

亡梦爱人 提交于 2019-11-26 21:19:02
How can I extract the first paragraph from a Wikipedia article, using Python? For example, for Albert Einstein , that would be: Albert Einstein (pronounced /ˈælbərt ˈaɪnstaɪn/; German: [ˈalbɐt ˈaɪnʃtaɪn] ( listen); 14 March 1879 – 18 April 1955) was a theoretical physicist, philosopher and author who is widely regarded as one of the most influential and iconic scientists and intellectuals of all time. A German-Swiss Nobel laureate, Einstein is often regarded as the father of modern physics.[2] He received the 1921 Nobel Prize in Physics "for his services to theoretical physics, and especially

Is there a clean wikipedia API just for retrieve content summary?

爱⌒轻易说出口 提交于 2019-11-26 18:00:24
I need just to retrieve first paragraph of a Wikipedia page. Content must be html formated, ready to be displayed on my websites (so NO BBCODE, or WIKIPEDIA special CODE!) There's a way to get the entire "intro section" without any html parsing! Similar to AnthonyS's answer with an additional explaintext param, you can get the intro section text in plain text. Query Getting Stack Overflow's intro in plain text: https://en.wikipedia.org/w/api.php?format=json&action=query&prop=extracts&exintro&explaintext&redirects=1&titles=Stack%20Overflow JSON Response (warnings stripped) { "query": { "pages":

How to use wikipedia api if it exists? [closed]

空扰寡人 提交于 2019-11-26 10:07:08
问题 I\'m trying to find out if there\'s a wikipedia api (I Think it is related to the mediawiki?). If so, I would like to know how I would tell wikipedia to give me an article about the new york yankees for example. What would the REST url be for this example? All the docs on this subject seem fairly complicated. 回答1: You really really need to spend some time reading the documentation, as this took me a moment to look and click on the link to fix it. :/ but out of sympathy i'll provide you a link

How to get the Infobox data from Wikipedia?

百般思念 提交于 2019-11-26 09:45:27
问题 If I have the url to a page, how would I obtain the Infobox information on the right using MediaWiki webservices? 回答1: Maybe a little late but i wanted the same thing and didn't see any easy solutions here, but (as Bryan points out) it turns out not to be too difficult to use the Mediawiki API with this library: https://github.com/siznax/wptools Usage: >>> import wptools >>> so = wptools.page('Stack Overflow').get_parse() >>> so.infobox {'alexa': '{{DecreasePositive}}', 'author': '[[Joel

Fetch a Wikipedia article with Python

半城伤御伤魂 提交于 2019-11-26 09:28:23
问题 I try to fetch a Wikipedia article with Python\'s urllib: f = urllib.urlopen(\"http://en.wikipedia.org/w/index.php?title=Albert_Einstein&printable=yes\") s = f.read() f.close() However instead of the html page I get the following response: Error - Wikimedia Foundation: Request: GET http://en.wikipedia.org/w/index.php?title=Albert_Einstein&printable=yes, from 192.35.17.11 via knsq1.knams.wikimedia.org (squid/2.6.STABLE21) to () Error: ERR_ACCESS_DENIED, errno [No Error] at Tue, 23 Sep 2008 09

Extract the first paragraph from a Wikipedia article (Python)

孤人 提交于 2019-11-26 07:56:13
问题 How can I extract the first paragraph from a Wikipedia article, using Python? For example, for Albert Einstein , that would be: Albert Einstein (pronounced /ˈælbərt ˈaɪnstaɪn/; German: [ˈalbɐt ˈaɪnʃtaɪn] ( listen); 14 March 1879 – 18 April 1955) was a theoretical physicist, philosopher and author who is widely regarded as one of the most influential and iconic scientists and intellectuals of all time. A German-Swiss Nobel laureate, Einstein is often regarded as the father of modern physics.[2