wikipedia

Retrieving the Interlanguage links from an exported Wikipedia article?

别来无恙 提交于 2019-12-05 00:25:08
问题 I used to retrieve the interlanguage links from an exported Wikipedia article by parsing the export with some regular expressions. In phase 1 of the WikiData project these links have been moved to a separate page on Wikidata. For example the article Ore Mountains has no language links anymore in the export. The language links are now on Q4198. How can I export the language links? 回答1: You are now encouraged to use the Wikidata aPI : http://wikidata.org/w/api.php For your case, use props

Extracting data from Wikipedia API

余生颓废 提交于 2019-12-04 23:41:53
问题 I would like to be able to extract a title and description from Wikipedia using json. So... wikipedia isn't my problem, I'm new to json and would like to know how to use it. Now I know there are hundreds of tutorials, but I've been working for hours and it just doesn't display anything, heres my code: <?php $url="http://en.wikipedia.org/w/api.php?action=query&prop=extracts|info&exintro&titles=google&format=json&explaintext&redirects&inprop=url"; $json = file_get_contents($url); $data = json

WebRequest to connect to the Wikipedia API

廉价感情. 提交于 2019-12-04 19:31:41
问题 This may be a pathetically simple problem, but I cannot seem to format the post webrequest/response to get data from the Wikipedia API. I have posted my code below if anyone can help me see my problem. string pgTitle = txtPageTitle.Text; Uri address = new Uri("http://en.wikipedia.org/w/api.php"); HttpWebRequest request = WebRequest.Create(address) as HttpWebRequest; request.Method = "POST"; request.ContentType = "application/x-www-form-urlencoded"; string action = "query"; string query =

How to access wikipedia

℡╲_俬逩灬. 提交于 2019-12-04 17:41:44
I want to access HTML content from wikipedia .But it is showing access denied. How can i access Wiki. Please give some suggestion Use HttpWebRequest Try the following: string Text = "http://www.wikipedia.org/"; HttpWebRequest request = (HttpWebRequest)HttpWebRequest.Create(Text); request.UserAgent = "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.0)"; HttpWebResponse respons; respons = (HttpWebResponse)request.GetResponse(); Encoding enc = Encoding.GetEncoding(respons.CharacterSet); StreamReader reader = new StreamReader(respons.GetResponseStream(), enc); string sr = reader.ReadToEnd(); 来源:

wikipedia api: get parsed introduction only

♀尐吖头ヾ 提交于 2019-12-04 16:50:56
Using PHP, is there a nice way to get the (parsed) introduction only from a wikipedia page? I have to current methods: The first is to call the api page and return, then call the Wiki parser on the introduction I have pulled from the first request (two requests, extracting the intro from the text isn't pretty either). The second is to call the entire page parser and use xpath to retrieve every <p> tag before the contents table. With both methods I then have to re-parse the HTML to ensure the relevant links inside the introduction link off to wikipedia. Neither are ideal really, there must be a

Wikipedia text download

早过忘川 提交于 2019-12-04 15:35:29
问题 I am looking to download full Wikipedia text for my college project. Do I have to write my own spider to download this or is there a public dataset of Wikipedia available online? To just give you some overview of my project, I want to find out the interesting words of few articles I am interested in. But to find these interesting words, I am planning to apply tf/idf to calculate term frequency for each word and pick the ones with high frequency. But to calculate the tf, I need to know the

How to get all links and their Wikidata IDs for a Wikipedia page?

纵饮孤独 提交于 2019-12-04 15:12:47
(When) will the following be possible? get the list of all links on a Wikipedia page with their respective Wikidata IDs in a single query/API call. receive additional information of the respective Wikidata items like a property value with the query. To get all Wikipedia page links you have to use Wikipedia API , and to get all Wikidata item properties you need Wikidata API , so it is not possible to create one query with two requests to both APIs. But! The first part of your question is already possible. And about the second... you didn't say anything for this what information you need from

Find main category for article using Wikipedia API

╄→尐↘猪︶ㄣ 提交于 2019-12-04 13:33:07
I have a list of articles and I want to find the main category of each article. Wikipedia lists its main categories here - http://en.wikipedia.org/wiki/Portal:Contents/Categories . I am able to find the subcategories of each article using: http://en.wikipedia.org/w/api.php?action=query&prop=categories&titles=%s&format=xml I also am able to check whether a subcategory is within a category: http://en.wikipedia.org/w/api.php?action=query&titles=Dog&prop=categories&clcategories=Domesticated animals&format=xml This will tell me whether "domesticated animals" is a subcategory of Dog, but this is not

Wikipedia list=search REST API: how to retrieve also Url of matching articles

半世苍凉 提交于 2019-12-04 11:15:03
问题 I'm studying Wikipedia REST API but I'm not able to find the right option to get also URLs for a search query. this is the URL of the request: http://it.wikipedia.org/w/api.php?action=query&list=search&srsearch=calvino&format=xml&srprop=snippet this request outputs only the Title and the Snippet but no URLs for articles. I've checked wikipedia API documentation for the list=search query but seems there is no option to get also URLs. Best Regards, Fabio Buda 回答1: You can form the URL of the

Query Wikipedia pages with properties

南笙酒味 提交于 2019-12-04 07:35:30
I need to use Wikipedia API Query or any other api such as Opensearch to query for a simple list of pages with some properties. Input: a list of page (article) titles or ids. Output: a list of pages that contain the following properties each: page id title snippet/description (like in opensearch api) page url image url (like in opensearch api) A result similar to this: http://en.wikipedia.org/w/api.php?action=opensearch&search=miles%20davis&limit=20&format=xml Only with page ids and not for a search, but rather an exact list of pages by either titles or pageids. This should be a fairly simple