wikipedia

Get Wikipedia page urls from an Excel list

↘锁芯ラ 提交于 2019-12-25 03:12:05
问题 I have an issue where I am not making any progress. I am working on my master thesis at the moment. For that I have a list of Actors and need to check which of them has a own (German) Wikipedia page. (approximately 20,000 actors) Since i am not very experienced in vba programming, I looked for a solution here in the forum. I found a code with which you can search for urls via google and get the first result copied into excel. Using VBA in Excel to Google Search in IE and return the hyperlink

Extract statistical information from Wikipedia article

十年热恋 提交于 2019-12-24 09:57:56
问题 I'm currently extracting data from DBpedia articles using a SPARQLWrapper for python, but I can't seem to find how to extract the number of watchers (and other statistical information) for a given article. Is there an easy way to achieve this? I don't mind if it's through DBpedia, or directly through wikipedia (using wget, for example). Thanks for any advice. 回答1: It shell be prohibited to get the number of watchers for every arbitrary article, as it is considered to be a security leak if

Problem in Wikipedia API

徘徊边缘 提交于 2019-12-24 09:57:29
问题 I have problem using the Wikipedia API. I use this PHP script, <?php $xmlDoc = new DOMDocument(); $xmlDoc->load("http://en.wikipedia.org/w/api.php?action=query&prop=revisions&titles=New_York_Yankees&rvprop=content&format=xml"); print $xmlDoc->saveXML(); ?> and I have the following result in the browser. Why? Warning: DOMDocument::load(http://en.wikipedia.org/w/api.php?action=query&prop=revisions&titles=New_York_Yankees&rvprop=content&format=xml) [domdocument.load]: failed to open stream: HTTP

API for getting edits on Wikipedia

两盒软妹~` 提交于 2019-12-24 09:48:42
问题 I want to get the text of the edit made on a Wikipedia page before and after the edit. I have this url: https://en.wikipedia.org/w/index.php?diff=328391582&oldid=328391343 But, I want the text in the json format so that I can directly use it in my program. Is there any API provided by MediaWiki that gives me the old and new text after an edit or do I have to parse the HTML page using a parser? 回答1: Try this: https://www.mediawiki.org/wiki/API:Revisions There are a few options which may be of

How to use XPath or xgrep to find information in Wikipedia?

半腔热情 提交于 2019-12-24 09:24:22
问题 I'd like to scrape some (not much) info from Wikipedia. Say I have a list of Universities and their Wikipedia page. Can I use an xpath expression to find the website (domain) of that University? So for instance, if I get the page curl http://en.wikipedia.org/wiki/Vienna_University_of_Technology this xpath expression should find the domain: http://www.tuwien.ac.at Ideally, this should work with the Linux xgrep command line tool, or equivalent. 回答1: With h prefix bound to http://www.w3.org/1999

How to query and save link in array to call later?

旧时模样 提交于 2019-12-24 08:23:24
问题 I am doing a query in wikipedia to get a snippet and the title with the link. Then if i click a title I would like to get the full article in a modal. I am trying to get different articles for each different link I get in the first query. Here it is a JsFiddle $("#wiki").on('click', function(e) { var articleName = $(this).data('subject'); $.getJSON("https://it.wikipedia.org/w/api.php?callback=?", { srsearch: articleName, action: "query", list: "search", format: "json" }, function(data) { $("

Scraping Wikipedia tables with Python selectively

一世执手 提交于 2019-12-24 07:52:28
问题 I have troubles sorting a wiki table and hope someone who has done it before can give me advice. From the List_of_current_heads_of_state_and_government I need countries (works with the code below) and then only the first mention of Head of state + their names. I am not sure how to isolate the first mention as they all come in one cell. And my attempt to pull their names gives me this error: IndexError: list index out of range . Will appreciate your help! import requests from bs4 import

Script always gets a 302 response when pulling random pages from Wikipedia

ぐ巨炮叔叔 提交于 2019-12-24 05:09:12
问题 I can pull a any page from wikipedia with import httplib conn = httplib.HTTPConnection("en.wikipedia.org") conn.debuglevel = 1 conn.request("GET","/wiki/Normal_Distribution",headers={'User-Agent':'Python httplib'}) r1 = conn.getresponse() r1.read() The normal response will be reply: 'HTTP/1.0 200 OK\r\n' header: Date: Sun, 03 Apr 2011 23:49:36 GMT header: Server: Apache header: Cache-Control: private, s-maxage=0, max-age=0, must-revalidate header: Content-Language: en header: Vary: Accept

Importing wikipedia-dump to SQL-base

微笑、不失礼 提交于 2019-12-24 03:25:32
问题 I have a problem with MySQL. I wanted to import a dump of Wikipedia in my MediaWiki (Local Server "Denwer"). First, I'm using MWdumper.jar for convert XML-dump to SQL-file. (for test I'am used simplewiki-dump with small size (~92 MB)) Then import sql-file to my sql-base: I'm from command line, using mysql.exe enter a commands: mysql.exe --user=root --password= use wikidb source X:[path to dump.sql]\dump.sql The process of long runs normally (message: Query OK). But at some point, I get an