wikipedia | 易学教程

How to group wikipedia categories in python?

阅读更多关于 How to group wikipedia categories in python?

问题 For each concept of my dataset I have stored the corresponding wikipedia categories. For example, consider the following 5 concepts and their corresponding wikipedia categories. hypertriglyceridemia: ['Category:Lipid metabolism disorders', 'Category:Medical conditions related to obesity'] enzyme inhibitor: ['Category:Enzyme inhibitors', 'Category:Medicinal chemistry', 'Category:Metabolism'] bypass surgery: ['Category:Surgery stubs', 'Category:Surgical procedures and techniques'] perth: [

Wikipedia Category Hierarchy from dumps

阅读更多关于 Wikipedia Category Hierarchy from dumps

问题 Using Wikipedia's dumps I want to build a hierarchy for its categories. I have downloaded the main dump (enwiki-latest-pages-articles) and the category SQL dump (enwiki-latest-category). But I can't find the hierarchy information. For example, the SQL categories' dump has entries for each category but I can't find anything about how they relate to each other. The other dump (latest-pages-articles) says the parent categories for each page but in an unordered way. It just states all the parents

Wikipedia text download

阅读更多关于 Wikipedia text download

I am looking to download full Wikipedia text for my college project. Do I have to write my own spider to download this or is there a public dataset of Wikipedia available online? To just give you some overview of my project, I want to find out the interesting words of few articles I am interested in. But to find these interesting words, I am planning to apply tf/idf to calculate term frequency for each word and pick the ones with high frequency. But to calculate the tf, I need to know the total occurrences in whole of Wikipedia. How can this be done? from wikipedia: http://en.wikipedia.org

random seek in 7z single file archive

阅读更多关于 random seek in 7z single file archive

Is it possible to do random access (a lot of seeks) to very huge file, compressed by 7zip? The original file is very huge (999gb xml) and I can't store it in unpacked format (i have no so much free space). So, if 7z format allows accessing to middle block without uncompressing all blocks before selected one, I can built an index of block beginning and corresponding original file offsets. Header of my 7z archive is 37 7A BC AF 27 1C 00 02 28 99 F1 9D 4A 46 D7 EA // 7z archive version 2;crc; n.hfr offset 00 00 00 00 44 00 00 00 00 00 00 00 F4 56 CF 92 // n.hdr offset; n.hdr size=44. crc 00 1E 1B

How to extract data from a Wikipedia article?

阅读更多关于 How to extract data from a Wikipedia article?

问题 I have a question regarding parsing data from Wikipedia for my Android app. I have a script that can download the XML by reading the source from http://en.wikipedia.org/w/api.php?action=parse&prop=text&format=xml&page=ARTICLE_NAME (and also the JSON by replacing format=xml with format=json . But what I can't figure out is how to only access certain sections from the table of contents. What I want is when the page is loaded, the user can press a button that makes a pop-up appear that displays

Reverse wikipedia geotagging lookup

阅读更多关于 Reverse wikipedia geotagging lookup

问题 Wikipedia is geotagging a lot of its articles. (Look in the top right corner of the page.) Is there any API for querying all geotagged pages within a specified radius of a geographical position? Update Okay, so based on lost-theory's answer I tried this (on DBpedia query explorer): PREFIX geo: <http://www.w3.org/2003/01/geo/wgs84_pos#> SELECT ?subject ?label ?lat ?long WHERE { ?subject geo:lat ?lat. ?subject geo:long ?long. ?subject rdfs:label ?label. FILTER(xsd:float(?lat) - 57.03185 <= 0.05

Wikipedia list=search REST API: how to retrieve also Url of matching articles

阅读更多关于 Wikipedia list=search REST API: how to retrieve also Url of matching articles

I'm studying Wikipedia REST API but I'm not able to find the right option to get also URLs for a search query. this is the URL of the request: http://it.wikipedia.org/w/api.php?action=query&list=search&srsearch=calvino&format=xml&srprop=snippet this request outputs only the Title and the Snippet but no URLs for articles. I've checked wikipedia API documentation for the list=search query but seems there is no option to get also URLs. Best Regards, Fabio Buda You can form the URL of the article easily by yourself from the title. For the Italian Wikipedia, it's http://it.wikipedia.org/wiki/

How to get wikipedia page in multi languages?

阅读更多关于 How to get wikipedia page in multi languages?

问题 How can I get the same wikipedia page in another language. For example I want to get this page in Japanese, http://en.wikipedia.org/wiki/Cloud result is http://ja.wikipedia.org/wiki/雲 or only the title 雲 Is it possible to use wikipedia API or any other APIs to do this? Thank You 回答1: Lang links property of MediaWiki API is probably what you want. Fetching other languages for your cloud example would look like this. 来源： https://stackoverflow.com/questions/4420584/how-to-get-wikipedia-page-in

Wikipedia api fulltext search to return articles with title, snippet and image

阅读更多关于 Wikipedia api fulltext search to return articles with title, snippet and image

问题 I've been looking for a way to query the wikipedia api based on a search string for a list of articles with the following properties: Title Snippet/Description One or more images related to the article. I also have to make the query using jsonp. I've tried using the list=search parameter http://en.wikipedia.org/w/api.php?action=query&list=search&prop=images&format=json&srsearch=test&srnamespace=0&srprop=snippet&srlimit=10&imlimit=1 But it seems to ignore the prop=images, I've also tried

Searching Wikipedia using API

阅读更多关于 Searching Wikipedia using API

问题 I want to search Wikipedia using the query action. I am using this url: http://en.wikipedia.org/w/api.php?action=query&format=json&list=search&srsearch=apple That works but I want to get into the first result of the search. How can I do that? Note: That url works fine when there is only one result.. I just need the title and some short description. 回答1: I don't think you can do both in one query. 1. To get the first result, use the Opensearch API. https://en.wikipedia.org/w/api.php?action