wikipedia

How to get Wikipedia content as text by API?

徘徊边缘 提交于 2021-02-08 08:52:11
问题 I want to get Wikipedia pages as text. I looked at the Wikipedia API from here https://en.wikipedia.org/w/api.php which says that in order to get pages as text I need to append this to a page address: api.php?action=query&meta=siteinfo&siprop=namespaces&format=txt However, when I try appending this suffix to a normal page's address, the page is not found: https://en.wikipedia.org/wiki/George_Washington/api.php?action=query&meta=siteinfo&siprop=namespaces&format=txt Following the instructions

How to reliably get the image used in the Wikipedia Infobox?

大憨熊 提交于 2021-02-07 22:42:20
问题 How do I (reliably) get the main image(s) used in the Wikipedia Infobox from the API? This question has been asked before and the accepted answer admits that it is just a guess. Subsequent answers seem like a hack, at best and don't return the correct image. For instance, the Jimi Hendrix Wikipedia entry uses "File:Jimi Hendrix 1967.png" as the main image in the InfoBox. The updated answers suggest using this url but for Jimi Hendrix (and other topics) it often returns the wrong image. If I

How to reliably get the image used in the Wikipedia Infobox?

人盡茶涼 提交于 2021-02-07 22:42:19
问题 How do I (reliably) get the main image(s) used in the Wikipedia Infobox from the API? This question has been asked before and the accepted answer admits that it is just a guess. Subsequent answers seem like a hack, at best and don't return the correct image. For instance, the Jimi Hendrix Wikipedia entry uses "File:Jimi Hendrix 1967.png" as the main image in the InfoBox. The updated answers suggest using this url but for Jimi Hendrix (and other topics) it often returns the wrong image. If I

API to retrieve info about famous people [closed]

谁说胖子不能爱 提交于 2021-02-07 12:28:22
问题 As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance. Closed 8 years ago . I'm looking for some callable way to get information about famous people and celebrities. Given a string, I'd like to determine if it

How to scrape links from Wikipedia with Python

假如想象 提交于 2021-01-28 07:23:39
问题 I am trying to scrape all the Links to battles from the "List of Naval Battles" on Wikipedia using python. The trouble is that I cannot figure out how to export all of the links containing the words "/wiki/Battle" to my CSV file. I am used to C++, so python is kind of foreign to me. Any ideas? Here is what I have so far... from bs4 import BeautifulSoup import urllib2 rootUrl = "https://en.wikipedia.org/wiki/List_of_naval_battles" def get_soup(url,header): return BeautifulSoup( urllib2.urlopen

How to scrape data from different Wikipedia pages?

怎甘沉沦 提交于 2020-07-08 02:53:17
问题 I've scrapped the wikipedia table using Python Beautifulsoup (https://en.wikipedia.org/wiki/Districts_of_Hong_Kong). But except for the offered data (i.e. population, area, density and region), I would like to get the location coordinates for each district. The data should get from another page of each district (there are the hyperlinks on the table). Take the first district 'Central and Western District' for example, the DMS coordinates (22°17′12″N 114°09′18″E) can be found on the page. By

Import English Wikipedia dump into SQL Server

非 Y 不嫁゛ 提交于 2020-07-08 00:47:47
问题 I've downloaded the latest English Wikipedia dump (enwiki-latest-pages-articles-multistream.xml) from here, and I'm trying to import it to SQL Server 2018. I can’t see the XML file because it weighs over 75 GB, and thus I don't know what kind of tables I should create before I'm going to work with Bulk XML. How can I do this? I can write some script on Python or C# . Thanks in advance! 回答1: Use following SQL Query to create database Create Database Feed ; GO USE [Feed] drop table Doc drop

Parse birth and death dates from Wikipedia?

丶灬走出姿态 提交于 2020-06-09 11:29:12
问题 I'm trying to write a python program that can search wikipedia for the birth and death dates for people. For example, Albert Einstein was born: 14 March 1879; died: 18 April 1955. I started with Fetch a Wikipedia article with Python import urllib2 opener = urllib2.build_opener() opener.addheaders = [('User-agent', 'Mozilla/5.0')] infile = opener.open('http://en.wikipedia.org/w/api.php?action=query&prop=revisions&rvprop=content&rvsection=0&titles=Albert_Einstein&format=xml') page2 = infile

How to obtain a list of titles of all Wikipedia articles

元气小坏坏 提交于 2020-05-09 19:31:56
问题 I'd like to obtain a list of all the titles of all Wikipedia articles. I know there are two possible ways to get content from a Wikimedia powered wiki. One would be the API and the other one would be a database dump. I'd prefer not to download the wiki dump. First, it's huge, and second, I'm not really experienced with querying databases. The problem with the API on the other hand is that I couldn't figure out a way to only retrieve a list of the article titles and even if it would need > 4