wikipedia-api

Parsing a Wikipedia dump

℡╲_俬逩灬. 提交于 2019-11-27 14:02:42
For example using this Wikipedia dump: http://en.wikipedia.org/w/api.php?action=query&prop=revisions&titles=lebron%20james&rvprop=content&redirects=true&format=xmlfm Is there an existing library for Python that I can use to create an array with the mapping of subjects and values? For example: {height_ft,6},{nationality, American} It looks like you really want to be able to parse MediaWiki markup. There is a python library designed for this purpose called mwlib . You can use python's built-in XML packages to extract the page content from the API's response, then pass that content into mwlib's

Get Text Content from mediawiki page via API

我的梦境 提交于 2019-11-27 11:01:14
I'm quite new to MediaWiki, and now I have a bit of a problem. I have the title of some Wiki page, and I want to get just the text of a said page using api.php , but all that I have found in the API is a way to obtain the Wiki content of the page (with wiki markup). I used this HTTP request... /api.php?action=query&prop=revisions&rvlimit=1&rvprop=content&format=xml&titles=test But I need only the textual content, without the Wiki markup. Is that possible with the MediaWiki API? I don't think it is possible using the API to get just the text. What has worked for me was to request the HTML page

How to get Wikipedia content using Wikipedia's API?

霸气de小男生 提交于 2019-11-27 10:28:06
I want to get the first paragraph of a Wikipedia article. What is the API query to do so? See this section on the MediaWiki docs These are the key parameters. prop=revisions&rvprop=content&rvsection=0 rvsection = 0 specifies to only return the lead section. See this example. http://en.wikipedia.org/w/api.php?action=query&prop=revisions&rvprop=content&rvsection=0&titles=pizza To get the HTML, you can use similarly use action=parse http://en.wikipedia.org/w/api.php?action=parse&section=0&prop=text&page=pizza Note, that you'll have to strip out any templates or infoboxes. AnthonyS See Is there a

Wikipedia API + Cross-origin requests

时光怂恿深爱的人放手 提交于 2019-11-27 07:39:42
I'm trying to access wikipedia using javascript+ CORS As far as I know, wikipedia should support CORS: http://www.mediawiki.org/wiki/API:Cross-site_requests I tried the following script: create a XMLHttpRequest+credential/XDomainRequest, add some Http-Headers ( "Access-Control-Allow-Credentials",...) and send the query. http://jsfiddle.net/lindenb/Vr7RS/ var WikipediaCORS= { setMessage:function(msg) { var span=document.getElementById("id1"); span.appendChild(document.createTextNode(msg)); }, // Create the XHR object. createCORSRequest:function(url) { var xhr = new XMLHttpRequest(); if (

How to get Infobox from a Wikipedia article by Mediawiki API?

六月ゝ 毕业季﹏ 提交于 2019-11-27 07:26:45
Wikipedia articles may have Infobox templates. By the following call I can get the first section of an article which includes Infobox. http://en.wikipedia.org/w/api.php?action=parse&pageid=568801&section=0&prop=wikitext What I want is a query which will return only Infobox data. Is this possible? You can do it with a url call to the Wikipedia API like this: http://en.wikipedia.org/w/api.php?action=query&prop=revisions&rvprop=content&format=xmlfm&titles=Scary%20Monsters%20and%20Nice%20Sprites&rvsection=0 Replace the titles= section with your page title, and format=xmlfm to format=json if you

Accessing main picture of wikipedia page by API [closed]

谁说我不能喝 提交于 2019-11-27 06:03:08
Is there any way I can access the thumbnail picture of any wikipedia page by using an API? I mean the image on the top right side in box. Is there any APIs for that? varatis http://en.wikipedia.org/w/api.php Look at prop=images . It returns an array of image filenames that are used in the parsed page. You then have the option of making another API call to find out the full image URL, e.g.: action=query&titles=Image:INSERT_EXAMPLE_FILE_NAME_HERE.jpg&prop=imageinfo&iiprop=url or to calculate the URL via the filename's hash . Unfortunately, while the array of images returned by prop=images is in

No response from MediaWiki API using jQuery

 ̄綄美尐妖づ 提交于 2019-11-27 04:07:05
I've tried to get some content from Wikipedia as JSON: $.getJSON("http://en.wikipedia.org/w/api.php?action=query&prop=revisions&rvprop=content&titles="+title+"&format=json", function(data) { doSomethingWith(data); }); But I got nothing in response. If I paste to the browser's adress bar, something like http://en.wikipedia.org/w/api.php?action=query&prop=revisions&rvprop=content&titles=jQuery&format=json I get the expected content. What's wrong? You need to trigger JSONP behavior with $.getJSON() by adding &callback=? on the querystring, like this: $.getJSON("http://en.wikipedia.org/w/api.php

How to use wikipedia api if it exists? [closed]

爷,独闯天下 提交于 2019-11-27 02:36:51
I'm trying to find out if there's a wikipedia api (I Think it is related to the mediawiki?). If so, I would like to know how I would tell wikipedia to give me an article about the new york yankees for example. What would the REST url be for this example? All the docs on this subject seem fairly complicated. You really really need to spend some time reading the documentation, as this took me a moment to look and click on the link to fix it. :/ but out of sympathy i'll provide you a link that maybe you can learn to use. http://en.wikipedia.org/w/api.php?action=query&prop=revisions&titles=New

How do I get all articles about people from Wikipedia?

自闭症网瘾萝莉.ら 提交于 2019-11-27 02:12:02
问题 What would be the easiest way to get all articles about people from Wikipedia? I know I can download a dump of all the pages, but then how do I filter those and get only the ones about people? I need as many as I can get (preferably more than a million) so using any sort of API is probably not an option. 回答1: Since articles about people usually contain the Persondata template, you can just search for all articles that contain Persondata. You can find a sample API query for doing just that

How to get all article pages under a Wikipedia Category and its sub-categories?

|▌冷眼眸甩不掉的悲伤 提交于 2019-11-27 02:07:54
问题 I want to get all the articles names under a category and its sub-categories. Options I'm aware of: Using the Wikipedia API. Does it have such an option?? d/l the dump. Which format would be better for my usage? There is also an option to search in Wikipedia something like incategory:"music" , but I didn't see an option to view that in XML. Please share your thoughts 回答1: The following resource will help you to download all pages from the category and all its subcategories: http://en