wikipedia

How to extract data from a Wikipedia article?

∥☆過路亽.° 提交于 2019-12-02 21:20:27
I have a question regarding parsing data from Wikipedia for my Android app. I have a script that can download the XML by reading the source from http://en.wikipedia.org/w/api.php?action=parse&prop=text&format=xml&page=ARTICLE_NAME (and also the JSON by replacing format=xml with format=json . But what I can't figure out is how to only access certain sections from the table of contents. What I want is when the page is loaded, the user can press a button that makes a pop-up appear that displays the headers from the table of contents and allow the user to read that piece and only that piece of it

Reverse wikipedia geotagging lookup

时光总嘲笑我的痴心妄想 提交于 2019-12-02 20:55:24
Wikipedia is geotagging a lot of its articles . (Look in the top right corner of the page.) Is there any API for querying all geotagged pages within a specified radius of a geographical position? Update Okay, so based on lost-theory's answer I tried this (on DBpedia query explorer ): PREFIX geo: <http://www.w3.org/2003/01/geo/wgs84_pos#> SELECT ?subject ?label ?lat ?long WHERE { ?subject geo:lat ?lat. ?subject geo:long ?long. ?subject rdfs:label ?label. FILTER(xsd:float(?lat) - 57.03185 <= 0.05 && 57.03185 - xsd:float(?lat) <= 0.05 && xsd:float(?long) - 9.94513 <= 0.05 && 9.94513 - xsd:float(

Wikipedia api fulltext search to return articles with title, snippet and image

只愿长相守 提交于 2019-12-02 17:44:22
I've been looking for a way to query the wikipedia api based on a search string for a list of articles with the following properties: Title Snippet/Description One or more images related to the article. I also have to make the query using jsonp. I've tried using the list=search parameter http://en.wikipedia.org/w/api.php?action=query&list=search&prop=images&format=json&srsearch=test&srnamespace=0&srprop=snippet&srlimit=10&imlimit=1 But it seems to ignore the prop=images, I've also tried variations using the prop=imageinfo and prop=pageimages. But they all give me the same result as just using

Searching Wikipedia using API

牧云@^-^@ 提交于 2019-12-02 16:40:29
I want to search Wikipedia using the query action. I am using this url: http://en.wikipedia.org/w/api.php?action=query&format=json&list=search&srsearch=apple That works but I want to get into the first result of the search. How can I do that? Note: That url works fine when there is only one result.. I just need the title and some short description. octosquidopus I don't think you can do both in one query. 1. To get the first result, use the Opensearch API . https://en.wikipedia.org/w/api.php?action=opensearch&search=zyz&limit=1&namespace=0&format=jsonfm https://en.wikipedia.org/w/api.php

How to get all Wikipedia article titles?

这一生的挚爱 提交于 2019-12-02 15:06:43
How to get all Wikipedia article titles in one place without extra characters and pageids. Just the article's title. Something like this: When I download wikipedia dump, I get this Maybe I know a movement that might get me all pages but I wanted to get all pages in one take. Ainali You'll find it on https://dumps.wikimedia.org The latest List of page titles in main namespace for English Wikipedia as a database dump is here (69 MB). If you rather want it through the API you use query and list=allpages but that only give you maximum 500 (5k for bots) at a time, so you will have to make more than

XPath to get markup between two headings

。_饼干妹妹 提交于 2019-12-02 03:50:12
I am trying to write a small application to extract content from Wikipedia pages. When I first thought if it, I thought that I could just target divs containing content with XPath, but after looking into how Wikipedia builds their articles, I quickly discovered that wouldn't be so easy. The best way to separate content when I get the page, is to select what's between two sets of h2 tags. Example: <h2>Title</h2> <div>Some Content</div> <h2>Title</h2> Here I would want to get the div between the sets of headers. I tried doing this with XPath, but with no luck at all. I am going to look more into

filter data from mediawiki api ios

帅比萌擦擦* 提交于 2019-12-01 14:42:33
I used the "action=query&prop=revisions&rvprop=content&titles=%@&format=json&redirects" api for getting the details about Anil_Ambani. In response i got the following dictionary <i> query = { normalized = ( { from = "Anil_Ambani"; to = "Anil Ambani"; } ); pages = { 1222313 = { ns = 0; pageid = 1222313; revisions = ( { "*" = "{{BLP sources|date=June 2012}}\n{{Infobox person\n| name = Anil Ambani \n| image =AnilAmbani.jpg\n| image_size = \n| caption = Ambani in 2009\n| birth_date = {{Birth date and age|1959|6|4|df=y}}\n| birth_place = [[Mumbai]], [[Maharashtra]], [[India]]\n| nationality =

Indexing wikipedia dump with solr

坚强是说给别人听的谎言 提交于 2019-12-01 13:20:15
I have solr 3.6.2 installed on my machine, perfectly running with tomcat. I want to index a wikipedia dump file using solr. How do I do this using DataImportHandler? Any other way? I don't have any knowledge of xml. The file I have mentioned has size of around 45GB when extracted. Any help would be greatly appreciated. Update- I tried doing whats said on the DataImportHandler page. But there is some error maybe because their version of solr is much older. My data.config- <dataConfig> <dataSource type="FileDataSource" encoding="UTF-8" /> <document> <entity name="page" processor=

How to form dbPedia iSPARQL query (for wikipedia content)

最后都变了- 提交于 2019-12-01 13:02:39
Say I need to fetch content from wikipedia about all mountains. My target is to show initial paragraph, and an image from respective article (eg. Monte Rosa and Vincent Pyramid . I came to know about dbpedia, and with some research got to find that it provides live queries into wiki database directly. I have 2 questions: 1 - I am finding it difficult how could I formulate my queries. I can't play around iSPARQL . I tried following query but it throws error saying invalid xml. SELECT DISTINCT ?Mountain FROM <http://dbpedia.org> WHERE { [] rdf:type ?Mountain } 2 - My requirement is to show only

Example using WikipediaTokenizer in Lucene

浪尽此生 提交于 2019-12-01 12:53:14
I want to use WikipediaTokenizer in lucene project - http://lucene.apache.org/java/3_0_2/api/contrib-wikipedia/org/apache/lucene/wikipedia/analysis/WikipediaTokenizer.html But I never used lucene. I just want to convert a wikipedia string into a list of tokens. But, I see that there are only four methods available in this class, end, incrementToken, reset, reset(reader). Can someone point me to an example to use it. Thank you. In Lucene 3.0, next() method is removed. Now you should use incrementToken to iterate through the tokens and it returns false when you reach the end of the input stream.