wikipedia

Create a HashMap with a fixed Key corresponding to a HashSet. point of departure

吃可爱长大的小学妹 提交于 2019-12-01 12:21:11
My aim is to create a hashmap with a String as the key, and the entry values as a HashSet of Strings. OUTPUT This is what the output looks like now: Hudson+(surname)=[Q2720681], Hudson,+Quebec=[Q141445], Hudson+(given+name)=[Q5928530], Hudson,+Colorado=[Q2272323], Hudson,+Illinois=[Q2672022], Hudson,+Indiana=[Q2710584], Hudson,+Ontario=[Q5928505], Hudson,+Buenos+Aires+Province=[Q10298710], Hudson,+Florida=[Q768903]] According to my idea, it should look like this: [Hudson+(surname)=[Q2720681,Q141445,Q5928530,Q2272323,Q2672022]] The purpose is to store a particular name in Wikidata and then all

filter data from mediawiki api ios

馋奶兔 提交于 2019-12-01 12:09:11
问题 I used the "action=query&prop=revisions&rvprop=content&titles=%@&format=json&redirects" api for getting the details about Anil_Ambani. In response i got the following dictionary <i> query = { normalized = ( { from = "Anil_Ambani"; to = "Anil Ambani"; } ); pages = { 1222313 = { ns = 0; pageid = 1222313; revisions = ( { "*" = "{{BLP sources|date=June 2012}}\n{{Infobox person\n| name = Anil Ambani \n| image =AnilAmbani.jpg\n| image_size = \n| caption = Ambani in 2009\n| birth_date = {{Birth date

How to form dbPedia iSPARQL query (for wikipedia content)

你。 提交于 2019-12-01 10:41:09
问题 Say I need to fetch content from wikipedia about all mountains. My target is to show initial paragraph, and an image from respective article (eg. Monte Rosa and Vincent Pyramid. I came to know about dbpedia, and with some research got to find that it provides live queries into wiki database directly. I have 2 questions: 1 - I am finding it difficult how could I formulate my queries. I can't play around iSPARQL. I tried following query but it throws error saying invalid xml. SELECT DISTINCT

API to get Wikipedia revision id by date [closed]

拥有回忆 提交于 2019-12-01 09:16:04
Is there any API to get wikipedia revision id by date, instead of checking all the revision history and extract out the most recent revision before that date? Thank you! Bergi The revision query api allows you to pass timestamps to get only revisions from a specified interval. Use api.php?action=query&prop=revisions&rvlimit=1&rvstart= myTimestamp Check out the Manual:WfTimestamp for accepted formats - yyyymmddhhmmss always works. Building on the previous answer: The always accepted format yyyymmddhhmmss returned errors. This example returned a valid response: JSON XML The id in field revid

dbpedia fetch entitites in language other than english

Deadly 提交于 2019-12-01 09:11:43
I'm trying to extract entity dictionary contains person name etc. from dbpedia using sparql. PREFIX owl: <http://dbpedia.org/ontology/> PREFIX dbpprop: <http://dbpedia.org/property/> SELECT ?name WHERE { ?person a owl:Person . ?person dbpprop:name ?name . FILTER(lang(?name) = "en") } The query above did succeed, but when I change the language name to fr , there is nothing to fetch. How can I fetch names in other languages? Moreover, why can't I filter language using query below? SELECT ?name WHERE { ?person a owl:Person . ?person dbpprop:language "English" ?person dbpprop:name ?name . } //

Removing html tags when crawling wikipedia with python's urllib2 and Beautifulsoup

孤人 提交于 2019-12-01 07:40:33
问题 I am trying to crawl wikipedia to get some data for text mining. I am using python's urllib2 and Beautifulsoup. My question is that: is there an easy way of getting rid of the unnecessary tags(like links 'a's or 'span's) from the text I read. for this scenario: import urllib2 from BeautifulSoup import * opener = urllib2.build_opener() opener.addheaders = [('User-agent', 'Mozilla/5.0')] infile = opener.open("http://en.wikipedia.org/w/index.php?title=data_mining&printable=yes")pool =

dbpedia fetch entitites in language other than english

点点圈 提交于 2019-12-01 07:30:19
问题 I'm trying to extract entity dictionary contains person name etc. from dbpedia using sparql. PREFIX owl: <http://dbpedia.org/ontology/> PREFIX dbpprop: <http://dbpedia.org/property/> SELECT ?name WHERE { ?person a owl:Person . ?person dbpprop:name ?name . FILTER(lang(?name) = "en") } The query above did succeed, but when I change the language name to fr , there is nothing to fetch. How can I fetch names in other languages? Moreover, why can't I filter language using query below? SELECT ?name

Why can't I fetch wikipedia pages with LWP::Simple?

邮差的信 提交于 2019-12-01 03:28:34
I'm trying to fetch Wikipedia pages using LWP::Simple , but they're not coming back. This code: #!/usr/bin/perl use strict; use LWP::Simple; print get("http://en.wikipedia.org/wiki/Stack_overflow"); doesn't print anything. But if I use some other webpage, say http://www.google.com , it works fine. Is there some other name that I should be using to refer to Wikipedia pages? What could be going on here? Apparently Wikipedia blocks LWP::Simple requests: http://www.perlmonks.org/?node_id=695886 The following works instead: #!/usr/bin/perl use strict; use LWP::UserAgent; my $url = "http://en

Retrieve another language of a Wikipedia page

喜欢而已 提交于 2019-11-30 22:27:19
Task: We have Wikipedia English page and need to retrieve the same page address in Russian. I know the Semantic Web solution - use simple query to DbPedia, but I am curious whether there are traditional solutions. I have asked the same question in semanticoverflow.com where Toby Inkster suggested to parse http://en.wikipedia.org/wiki/Colugo?action=raw results (there are other languages links in the bottom), but this way is too inefficient. Are there any other ways or DbPedia is the one real option? Wikipedia has an extensive API , which can provide language links information among others. In

jsoup - extract text from wikipedia article

久未见 提交于 2019-11-30 19:49:58
问题 I'm writing some Java code in order to realize NLP tasks upon texts using Wikipedia. How can I use JSoup to extract all the text of a Wikipedia article (for example all the text in http://en.wikipedia.org/wiki/Boston)? 回答1: Document doc = Jsoup.connect("http://en.wikipedia.org/wiki/Boston").get(); Element contentDiv = doc.select("div[id=content]").first(); contentDiv.toString(); // The result You retrieve formatted content this way, of course. If you want "raw" content you can filter the