wikipedia

Retrieve another language of a Wikipedia page

故事扮演 提交于 2019-11-30 17:52:53
问题 Task: We have Wikipedia English page and need to retrieve the same page address in Russian. I know the Semantic Web solution - use simple query to DbPedia, but I am curious whether there are traditional solutions. I have asked the same question in semanticoverflow.com where Toby Inkster suggested to parse http://en.wikipedia.org/wiki/Colugo?action=raw results (there are other languages links in the bottom), but this way is too inefficient. Are there any other ways or DbPedia is the one real

Get all Wikipedia Infobox Templates and all Pages using them

旧巷老猫 提交于 2019-11-30 12:43:22
Given a Wikipedia page like Wikipedia: Stack Overflow there are often Infoboxes (mostly on the right hand at the top of the page). Example screenshot: DBPedia lists all these attributes as RDF triples. You can see the example at DBPedia: Stack Overflow . There you see the property dbpprop:wikiPageUsesTemplate with the value dbpedia:Template:Infobox_website which is interesting. I want to know which Wikipedia pages use this template. How can i do that and list all pages which use the Infobox_website template? Preferably with a SPARQL query but i am open to other easy solutions. Next thing is a

Does the Wikipedia API support searches for a specific template?

限于喜欢 提交于 2019-11-30 08:43:27
Is it possible to query the Wikipedia API for articles that contain a specific template? The documentation does not describe any action that would filter search results to pages that contain a template. Specifically, I am after pages that contain Template:Persondata . After that, I am hoping to be able to retrieve just that specific template in order to populate genealogy data for the openancestry.org project. The query below shows that the Albert Einstein page contains the Persondata Template, but it doesn't return the contents of the template, and I don't know how to get a list of page

How would you handle different formats of dates?

▼魔方 西西 提交于 2019-11-30 06:05:15
问题 I have different types of dates formatting like: 27 - 28 August 663 CE 22 August 1945 19 May May 4 1945 – August 22 1945 5/4/1945 2-7-1232 03-4-1020 1/3/1 (year 1) 09/08/0 (year 0) Note they are all different formats, different order, some have 2 months, some only one, I tried to use moment js with no results, I also tried to use date js yet, no luck. I tried to do some splitting: dates.push({ Time : [] }); function doSelect(text) { return $wikiDOM.find(".infobox th").filter(function() {

Get first lines of Wikipedia Article

佐手、 提交于 2019-11-30 05:16:38
I got a Wikipedia-Article and I want to fetch the first z lines (or the first x chars, or the first y words, doesn't matter) from the article. The problem: I can get either the source Wiki-Text (via API) or the parsed HTML (via direct HTTP-Request, eventually on the print-version) but how can I find the first lines displayed? Normaly the source (both html and wikitext) starts with the info-boxes and images and the first real text to display is somewhere down in the code. For example: Albert Einstein on Wikipedia (print Version). Look in the code, the first real-text-line "Albert Einstein

Wikipedia API: how to search for a term in a specific category

半世苍凉 提交于 2019-11-30 05:13:29
I'm having hard time to figure out a basic task: how to find a term restricted in a specific category.. i feel Wiki API documentation is kinda confusing... I'd just like to receive as output a JSON file with all the suggestions related to that term ex. i search for Matrix, category movies, so i can have The Matrix 1 The Matrix 2 etc excluding math results etc... thanks I feel your pain bro, try something like: http://en.wikipedia.org/w/api.php?action=query&list=search&format=jsonfm&srsearch=matrix+incategory:English-language_films Change the above format from jsonfm to json for real json

Use freebase data on local server?

最后都变了- 提交于 2019-11-30 02:24:43
Are there any existing ways of using the freebase data dumps to create a database similar to what freebase offers, but on you own server? Pretty much freebase but locally and not through the API? I guess it would be possible to create, but are there any existing solutions for this already? Or any alternative solutions for similar data without using an API? I didnt find this for dbpedia either :| Take a look at the freebase-quad-rdfize project on Google Code. It should allow you to download the weekly Freebase quad dump and load it into the RDF triple store of your choice. An alternative to

Blacklist IP database

别说谁变了你拦得住时间么 提交于 2019-11-29 23:15:13
Is there an open database of blacklisted IP for the Web? With a lot of public web proxy you know... such the blacklist used by the Global blocking of Wikipedia. The Project Honeypot provides as service called Http:BL . As an active member of Project Honeypot you can query their database of IPs that are known as email address harvesters or comment spammers. You can use Blacklist IP Addresses Live Database from myip.ms - http://myip.ms/browse/blacklist/Blacklist_IP_Blacklist_IP_Addresses_Live_Database_Real-time They have latest Blacklist IPs collected during the last 10 days for use in .htaccess

RegEx needed for Wikipedia infobox

爷,独闯天下 提交于 2019-11-29 22:17:28
问题 OK, so here's what I need : We have the full XML of a Wikipedia article We need just the Infobox section I have tried various things, but my main issue seems to be not being able to matching "internal" curly brackets. Any ideas (or any regex you have managed to get this done?) For those of you who do not know what I'm talking about, here's a (somewhat abridged) example of what I'm trying to parse : http://regexr.com?38299 (What is needed is the part between {{Infobox ******* up to its

How to get plain text out of wikipedia

梦想的初衷 提交于 2019-11-29 21:02:15
I've been searching for about 2 months now to find a script that gets the Wikipedia description section only. (It's for a bot i'm building, not for IRC.) That is, when I say /wiki bla bla bla it will go to the Wikipedia page for bla bla bla , get the following, and return it to the chatroom: "Bla Bla Bla" is the name of a song made by Gigi D'Agostino. He described this song as "a piece I wrote thinking of all the people who talk and talk without saying anything". The prominent but nonsensical vocal samples are taken from UK band Stretch's song "Why Did You Do It" Here is the closest I've found