wikipedia

Wikipedia module python: jumping “wikipedia.exceptions.PageError”

允我心安 提交于 2020-01-07 03:13:09
问题 I'm trying to associate to each species name listed in a csv file the wikipedia summary and main image. I write this code: import csv import wikipedia wikipedia.set_lang('it') with open('D:\\GIS\\Dati\\Vinca\\specie_vinca.csv', 'rt', encoding="utf8") as f: reader = csv.reader(f) for row in reader: wikipage = wikipedia.page(row) print (wikipage.title) print (wikipage.summary) print ("Page URL: %s" % wikipage.url) print ("Nr. of images on page: %d" % len(wikipage.images)) print (" - Main Image:

Wikipedia page parsing program caught in endless graph cycle

不打扰是莪最后的温柔 提交于 2020-01-05 09:35:34
问题 My program is caught in a cycle that never ends, and I can't see how it get into this trap, or how to avoid it. It's parsing Wikipedia data and I think it's just following a connected component around and around. Maybe I can store the pages I've visited already in a set and if a page is in that set I won't go back to it? This is my project, its quite small, only three short classes. This is a link to the data it generates, I stopped it short, otherwise it would have gone on and on. This is

How can I load content from another site onto mine with JavaScript/jQuery?

荒凉一梦 提交于 2020-01-05 06:37:14
问题 I'm trying to get a wikipedia article to load onto my site. I'm trying to follow the instructions here: http://en.wikipedia.org/wiki/Wikipedia:WikiProject_Transwiki but I'm at a loss. I've tried: var xyz = document.getElementById(url("http://en.wikipedia.org/w/index.php?title=Special:Export&history=1&action=submit&pages=Albert_einstein") var xyz = $('#xyz').load('http://en.wikipedia.org/w/index.php?title=Special:Export&history=1&action=submit&pages=Albert_einstein'); document.write(xyz); 回答1:

no content from Wikipedia API search

家住魔仙堡 提交于 2020-01-05 03:49:27
问题 Good morning I am using the following API search, which used to return title, content and link of a Wikipedia entry: https://it.wikipedia.org/w/api.php?action=opensearch&search=alessandro%20leogrande&format=json&utf8=1 Just recently I noticed that it is always returning an empty content part ( [""] ): ["alessandro leogrande",["Alessandro Leogrande"],[""],["https://it.wikipedia.org/wiki/Alessandro_Leogrande"]] Can you please give me any insight? 回答1: It seems there is a problem with the

Equals signs in Wikipedia template parameters won't display properly

|▌冷眼眸甩不掉的悲伤 提交于 2020-01-03 16:38:29
问题 I've noticed that using links with equals signs in them doesn't seem to work properly (when the link is placed inside the {{missing information}} template). Is there any way to work around this limitation so that links with equals signs can be included inside MediaWiki templates? {{missing information|[https://www.google.com/search?q=google+search+test This link has an equals sign in it, and the template is not displaying properly.]}} {{missing information|[https://www.google.com/ This link

Equals signs in Wikipedia template parameters won't display properly

徘徊边缘 提交于 2020-01-03 16:37:08
问题 I've noticed that using links with equals signs in them doesn't seem to work properly (when the link is placed inside the {{missing information}} template). Is there any way to work around this limitation so that links with equals signs can be included inside MediaWiki templates? {{missing information|[https://www.google.com/search?q=google+search+test This link has an equals sign in it, and the template is not displaying properly.]}} {{missing information|[https://www.google.com/ This link

Is it possible to read Wikipedia using Python requests library?

大兔子大兔子 提交于 2020-01-03 02:56:10
问题 To read a content from a given URL I do the following: import requests proxies = {'http':'http://user:pswd@foo-webproxy.foo.com:7777'} url = 'http://example.com/foo/bar' r = requests.get(url, proxies = proxies) print r.text.encode('utf-8') And it works fine! I get the content. However, if I use another URL: url = 'https://en.wikipedia.org/wiki/Mestisko' It does not work. I get an error message that starts with: requests.exceptions.ConnectionError: ('Connection aborted.', error(10060 Is

Is it possible to read Wikipedia using Python requests library?

非 Y 不嫁゛ 提交于 2020-01-03 02:56:08
问题 To read a content from a given URL I do the following: import requests proxies = {'http':'http://user:pswd@foo-webproxy.foo.com:7777'} url = 'http://example.com/foo/bar' r = requests.get(url, proxies = proxies) print r.text.encode('utf-8') And it works fine! I get the content. However, if I use another URL: url = 'https://en.wikipedia.org/wiki/Mestisko' It does not work. I get an error message that starts with: requests.exceptions.ConnectionError: ('Connection aborted.', error(10060 Is

Blacklist IP database

主宰稳场 提交于 2019-12-29 18:36:20
问题 Is there an open database of blacklisted IP for the Web? With a lot of public web proxy you know... such the blacklist used by the Global blocking of Wikipedia. 回答1: The Project Honeypot provides as service called Http:BL. As an active member of Project Honeypot you can query their database of IPs that are known as email address harvesters or comment spammers. 回答2: You can use Blacklist IP Addresses Live Database from myip.ms - http://myip.ms/browse/blacklist/Blacklist_IP_Blacklist_IP

Indexing wikipedia with solr

半城伤御伤魂 提交于 2019-12-29 09:16:06
问题 I've installed solr 4.6.0 and follow the tutorial available at Solr's home page. Everything was fine, untill I need to do a real job that I'm about to do. I have to get a fast access to wikipedia content and I was advised to use Solr. Well, I was trying to follow the example in the link http://wiki.apache.org/solr/DataImportHandler#Example:_Indexing_wikipedia, but I couldn't get the example. I am newbie, and I don't know what means data_config.xml! <dataConfig> <dataSource type=