wikipedia | 易学教程

How to extract data from a Wikipedia article?

阅读更多关于 How to extract data from a Wikipedia article?

I have a question regarding parsing data from Wikipedia for my Android app. I have a script that can download the XML by reading the source from http://en.wikipedia.org/w/api.php?action=parse&prop=text&format=xml&page=ARTICLE_NAME (and also the JSON by replacing format=xml with format=json . But what I can't figure out is how to only access certain sections from the table of contents. What I want is when the page is loaded, the user can press a button that makes a pop-up appear that displays the headers from the table of contents and allow the user to read that piece and only that piece of it

Reverse wikipedia geotagging lookup

阅读更多关于 Reverse wikipedia geotagging lookup

Wikipedia is geotagging a lot of its articles . (Look in the top right corner of the page.) Is there any API for querying all geotagged pages within a specified radius of a geographical position? Update Okay, so based on lost-theory's answer I tried this (on DBpedia query explorer ): PREFIX geo: <http://www.w3.org/2003/01/geo/wgs84_pos#> SELECT ?subject ?label ?lat ?long WHERE { ?subject geo:lat ?lat. ?subject geo:long ?long. ?subject rdfs:label ?label. FILTER(xsd:float(?lat) - 57.03185 <= 0.05 && 57.03185 - xsd:float(?lat) <= 0.05 && xsd:float(?long) - 9.94513 <= 0.05 && 9.94513 - xsd:float(

Wikipedia api fulltext search to return articles with title, snippet and image

阅读更多关于 Wikipedia api fulltext search to return articles with title, snippet and image

I've been looking for a way to query the wikipedia api based on a search string for a list of articles with the following properties: Title Snippet/Description One or more images related to the article. I also have to make the query using jsonp. I've tried using the list=search parameter http://en.wikipedia.org/w/api.php?action=query&list=search&prop=images&format=json&srsearch=test&srnamespace=0&srprop=snippet&srlimit=10&imlimit=1 But it seems to ignore the prop=images, I've also tried variations using the prop=imageinfo and prop=pageimages. But they all give me the same result as just using

Searching Wikipedia using API

阅读更多关于 Searching Wikipedia using API

I want to search Wikipedia using the query action. I am using this url: http://en.wikipedia.org/w/api.php?action=query&format=json&list=search&srsearch=apple That works but I want to get into the first result of the search. How can I do that? Note: That url works fine when there is only one result.. I just need the title and some short description. octosquidopus I don't think you can do both in one query. 1. To get the first result, use the Opensearch API . https://en.wikipedia.org/w/api.php?action=opensearch&search=zyz&limit=1&namespace=0&format=jsonfm https://en.wikipedia.org/w/api.php

How to get all Wikipedia article titles?

阅读更多关于 How to get all Wikipedia article titles?

How to get all Wikipedia article titles in one place without extra characters and pageids. Just the article's title. Something like this: When I download wikipedia dump, I get this Maybe I know a movement that might get me all pages but I wanted to get all pages in one take. Ainali You'll find it on https://dumps.wikimedia.org The latest List of page titles in main namespace for English Wikipedia as a database dump is here (69 MB). If you rather want it through the API you use query and list=allpages but that only give you maximum 500 (5k for bots) at a time, so you will have to make more than

XPath to get markup between two headings

阅读更多关于 XPath to get markup between two headings

I am trying to write a small application to extract content from Wikipedia pages. When I first thought if it, I thought that I could just target divs containing content with XPath, but after looking into how Wikipedia builds their articles, I quickly discovered that wouldn't be so easy. The best way to separate content when I get the page, is to select what's between two sets of h2 tags. Example: <h2>Title</h2> <div>Some Content</div> <h2>Title</h2> Here I would want to get the div between the sets of headers. I tried doing this with XPath, but with no luck at all. I am going to look more into

filter data from mediawiki api ios

阅读更多关于 filter data from mediawiki api ios

I used the "action=query&prop=revisions&rvprop=content&titles=%@&format=json&redirects" api for getting the details about Anil_Ambani. In response i got the following dictionary <i> query = { normalized = ( { from = "Anil_Ambani"; to = "Anil Ambani"; } ); pages = { 1222313 = { ns = 0; pageid = 1222313; revisions = ( { "*" = "{{BLP sources|date=June 2012}}\n{{Infobox person\n| name = Anil Ambani \n| image =AnilAmbani.jpg\n| image_size = \n| caption = Ambani in 2009\n| birth_date = {{Birth date and age|1959|6|4|df=y}}\n| birth_place = [[Mumbai]], [[Maharashtra]], [[India]]\n| nationality =

Indexing wikipedia dump with solr

阅读更多关于 Indexing wikipedia dump with solr

I have solr 3.6.2 installed on my machine, perfectly running with tomcat. I want to index a wikipedia dump file using solr. How do I do this using DataImportHandler? Any other way? I don't have any knowledge of xml. The file I have mentioned has size of around 45GB when extracted. Any help would be greatly appreciated. Update- I tried doing whats said on the DataImportHandler page. But there is some error maybe because their version of solr is much older. My data.config- <dataConfig> <dataSource type="FileDataSource" encoding="UTF-8" /> <document> <entity name="page" processor=

How to form dbPedia iSPARQL query (for wikipedia content)

阅读更多关于 How to form dbPedia iSPARQL query (for wikipedia content)

Say I need to fetch content from wikipedia about all mountains. My target is to show initial paragraph, and an image from respective article (eg. Monte Rosa and Vincent Pyramid . I came to know about dbpedia, and with some research got to find that it provides live queries into wiki database directly. I have 2 questions: 1 - I am finding it difficult how could I formulate my queries. I can't play around iSPARQL . I tried following query but it throws error saying invalid xml. SELECT DISTINCT ?Mountain FROM <http://dbpedia.org> WHERE { [] rdf:type ?Mountain } 2 - My requirement is to show only

Example using WikipediaTokenizer in Lucene

阅读更多关于 Example using WikipediaTokenizer in Lucene

I want to use WikipediaTokenizer in lucene project - http://lucene.apache.org/java/3_0_2/api/contrib-wikipedia/org/apache/lucene/wikipedia/analysis/WikipediaTokenizer.html But I never used lucene. I just want to convert a wikipedia string into a list of tokens. But, I see that there are only four methods available in this class, end, incrementToken, reset, reset(reader). Can someone point me to an example to use it. Thank you. In Lucene 3.0, next() method is removed. Now you should use incrementToken to iterate through the tokens and it returns false when you reach the end of the input stream.