wikipedia

How to group wikipedia categories in python?

北城以北 提交于 2019-12-03 12:15:50
问题 For each concept of my dataset I have stored the corresponding wikipedia categories. For example, consider the following 5 concepts and their corresponding wikipedia categories. hypertriglyceridemia: ['Category:Lipid metabolism disorders', 'Category:Medical conditions related to obesity'] enzyme inhibitor: ['Category:Enzyme inhibitors', 'Category:Medicinal chemistry', 'Category:Metabolism'] bypass surgery: ['Category:Surgery stubs', 'Category:Surgical procedures and techniques'] perth: [

Wikipedia Category Hierarchy from dumps

谁说胖子不能爱 提交于 2019-12-03 10:20:01
问题 Using Wikipedia's dumps I want to build a hierarchy for its categories. I have downloaded the main dump (enwiki-latest-pages-articles) and the category SQL dump (enwiki-latest-category). But I can't find the hierarchy information. For example, the SQL categories' dump has entries for each category but I can't find anything about how they relate to each other. The other dump (latest-pages-articles) says the parent categories for each page but in an unordered way. It just states all the parents

Wikipedia text download

大城市里の小女人 提交于 2019-12-03 09:44:24
I am looking to download full Wikipedia text for my college project. Do I have to write my own spider to download this or is there a public dataset of Wikipedia available online? To just give you some overview of my project, I want to find out the interesting words of few articles I am interested in. But to find these interesting words, I am planning to apply tf/idf to calculate term frequency for each word and pick the ones with high frequency. But to calculate the tf, I need to know the total occurrences in whole of Wikipedia. How can this be done? from wikipedia: http://en.wikipedia.org

random seek in 7z single file archive

夙愿已清 提交于 2019-12-03 09:03:07
Is it possible to do random access (a lot of seeks) to very huge file, compressed by 7zip? The original file is very huge (999gb xml) and I can't store it in unpacked format (i have no so much free space). So, if 7z format allows accessing to middle block without uncompressing all blocks before selected one, I can built an index of block beginning and corresponding original file offsets. Header of my 7z archive is 37 7A BC AF 27 1C 00 02 28 99 F1 9D 4A 46 D7 EA // 7z archive version 2;crc; n.hfr offset 00 00 00 00 44 00 00 00 00 00 00 00 F4 56 CF 92 // n.hdr offset; n.hdr size=44. crc 00 1E 1B

How to extract data from a Wikipedia article?

雨燕双飞 提交于 2019-12-03 07:52:42
问题 I have a question regarding parsing data from Wikipedia for my Android app. I have a script that can download the XML by reading the source from http://en.wikipedia.org/w/api.php?action=parse&prop=text&format=xml&page=ARTICLE_NAME (and also the JSON by replacing format=xml with format=json . But what I can't figure out is how to only access certain sections from the table of contents. What I want is when the page is loaded, the user can press a button that makes a pop-up appear that displays

Reverse wikipedia geotagging lookup

倾然丶 夕夏残阳落幕 提交于 2019-12-03 07:24:49
问题 Wikipedia is geotagging a lot of its articles. (Look in the top right corner of the page.) Is there any API for querying all geotagged pages within a specified radius of a geographical position? Update Okay, so based on lost-theory's answer I tried this (on DBpedia query explorer): PREFIX geo: <http://www.w3.org/2003/01/geo/wgs84_pos#> SELECT ?subject ?label ?lat ?long WHERE { ?subject geo:lat ?lat. ?subject geo:long ?long. ?subject rdfs:label ?label. FILTER(xsd:float(?lat) - 57.03185 <= 0.05

Wikipedia list=search REST API: how to retrieve also Url of matching articles

為{幸葍}努か 提交于 2019-12-03 06:17:18
I'm studying Wikipedia REST API but I'm not able to find the right option to get also URLs for a search query. this is the URL of the request: http://it.wikipedia.org/w/api.php?action=query&list=search&srsearch=calvino&format=xml&srprop=snippet this request outputs only the Title and the Snippet but no URLs for articles. I've checked wikipedia API documentation for the list=search query but seems there is no option to get also URLs. Best Regards, Fabio Buda You can form the URL of the article easily by yourself from the title. For the Italian Wikipedia, it's http://it.wikipedia.org/wiki/

How to get wikipedia page in multi languages?

雨燕双飞 提交于 2019-12-03 05:17:54
问题 How can I get the same wikipedia page in another language. For example I want to get this page in Japanese, http://en.wikipedia.org/wiki/Cloud result is http://ja.wikipedia.org/wiki/雲 or only the title 雲 Is it possible to use wikipedia API or any other APIs to do this? Thank You 回答1: Lang links property of MediaWiki API is probably what you want. Fetching other languages for your cloud example would look like this. 来源: https://stackoverflow.com/questions/4420584/how-to-get-wikipedia-page-in

Wikipedia api fulltext search to return articles with title, snippet and image

百般思念 提交于 2019-12-03 04:32:51
问题 I've been looking for a way to query the wikipedia api based on a search string for a list of articles with the following properties: Title Snippet/Description One or more images related to the article. I also have to make the query using jsonp. I've tried using the list=search parameter http://en.wikipedia.org/w/api.php?action=query&list=search&prop=images&format=json&srsearch=test&srnamespace=0&srprop=snippet&srlimit=10&imlimit=1 But it seems to ignore the prop=images, I've also tried

Searching Wikipedia using API

守給你的承諾、 提交于 2019-12-03 03:10:05
问题 I want to search Wikipedia using the query action. I am using this url: http://en.wikipedia.org/w/api.php?action=query&format=json&list=search&srsearch=apple That works but I want to get into the first result of the search. How can I do that? Note: That url works fine when there is only one result.. I just need the title and some short description. 回答1: I don't think you can do both in one query. 1. To get the first result, use the Opensearch API. https://en.wikipedia.org/w/api.php?action