wikipedia

Generating plain text from a Wikipedia database dump

烂漫一生 提交于 2019-12-22 04:41:10
问题 I found a Python script (here: Wikipedia Extractor) that can generate plain text from (English) Wikipedia database dump. When I use this command (as it's stated on the script's page): $ python enwiki-latest-pages-articles.xml WikiExtractor.py -b 500K -o extracted I get this error: File "enwiki-latest-pages-articles.xml", line 1 < mediawiki xmlns="http://www.mediawiki.org/xml/export-0.8/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.mediawiki.org/xml

Java: splitting up a large XML file with SAXParser

让人想犯罪 __ 提交于 2019-12-21 23:01:17
问题 I am trying to split a large XML file into smaller files using java's SAXParser (specifically the wikipedia dump which is about 28GB uncompressed). I have a Pagehandler class which extends DefaultHandler : private class PageHandler extends DefaultHandler { private StringBuffer text; ... @Override public void startElement(String uri, String localName, String qName, Attributes attributes) { text.append("<" + qName + ">"); } @Override public void endElement(String uri, String localName, String

wikipedia api: get parsed introduction only

岁酱吖の 提交于 2019-12-21 22:55:55
问题 Using PHP, is there a nice way to get the (parsed) introduction only from a wikipedia page? I have to current methods: The first is to call the api page and return, then call the Wiki parser on the introduction I have pulled from the first request (two requests, extracting the intro from the text isn't pretty either). The second is to call the entire page parser and use xpath to retrieve every <p> tag before the contents table. With both methods I then have to re-parse the HTML to ensure the

TripleDES key sizes - .NET vs Wikipedia

依然范特西╮ 提交于 2019-12-21 12:36:54
问题 According to Wikipedia, TripleDES supports 56, 112, and 168-bit key lengths, but the System.Cryptography.TripleDESCryptoServiceProvider.LegalKeySizes says it only accepts 128 and 192-bit key lengths. The system I'm developing needs to be interoperable (data encrypted by my code needs to be decryptable in PHP, Java, and Objective-C) and I don't who is correct in this case. So who should I believe? And how can I be sure my encrypted data is portable? 回答1: Wikipedia does not say TripleDES

prop=extracts not returning all extracts in the WikiMedia API

浪尽此生 提交于 2019-12-21 06:14:43
问题 I would like to use the wikipedia API to return the extract from multiple wikipedia articles at once. I am trying, for example, the following request (I just chose the pageids randomly): http://en.wikipedia.org/w/api.php?format=xml&action=query&pageids=3258248|11524059&prop=extracts&exsentences=1 But it only contains the extract for the first pageid, and not the second. Other properties seem not to have this limitation. For example http://en.wikipedia.org/w/api.php?format=xml&action=query

prop=extracts not returning all extracts in the WikiMedia API

こ雲淡風輕ζ 提交于 2019-12-21 06:11:39
问题 I would like to use the wikipedia API to return the extract from multiple wikipedia articles at once. I am trying, for example, the following request (I just chose the pageids randomly): http://en.wikipedia.org/w/api.php?format=xml&action=query&pageids=3258248|11524059&prop=extracts&exsentences=1 But it only contains the extract for the first pageid, and not the second. Other properties seem not to have this limitation. For example http://en.wikipedia.org/w/api.php?format=xml&action=query

prop=extracts not returning all extracts in the WikiMedia API

可紊 提交于 2019-12-21 06:11:36
问题 I would like to use the wikipedia API to return the extract from multiple wikipedia articles at once. I am trying, for example, the following request (I just chose the pageids randomly): http://en.wikipedia.org/w/api.php?format=xml&action=query&pageids=3258248|11524059&prop=extracts&exsentences=1 But it only contains the extract for the first pageid, and not the second. Other properties seem not to have this limitation. For example http://en.wikipedia.org/w/api.php?format=xml&action=query

MYSQL Insert Huge SQL Files of GB in Size

只愿长相守 提交于 2019-12-21 06:08:11
问题 I'm trying to create a Wikipedia DB copy (Around 50GB), but having problems with the largest SQL files. I've split the files of size in GB using linux split utility into chunks of 300 MB. e.g. split -d -l 50 ../enwiki-20070908-page page.input. On average 300MB files take 3 hours at my server. I've ubuntu 12.04 server OS and Mysql 5.5 Server. I'm trying like following: mysql -u username -ppassword database < category.sql Note: these files consist of Insert statements and these are not CSV

random seek in 7z single file archive

我们两清 提交于 2019-12-21 03:16:17
问题 Is it possible to do random access (a lot of seeks) to very huge file, compressed by 7zip? The original file is very huge (999gb xml) and I can't store it in unpacked format (i have no so much free space). So, if 7z format allows accessing to middle block without uncompressing all blocks before selected one, I can built an index of block beginning and corresponding original file offsets. Header of my 7z archive is 37 7A BC AF 27 1C 00 02 28 99 F1 9D 4A 46 D7 EA // 7z archive version 2;crc; n

How to add custom menu item to UITextView menu, which is a link to the Wikipedia page of the selected word?

有些话、适合烂在心里 提交于 2019-12-20 19:36:54
问题 I am new to Xcode, I am using version 4.6.3 - Macbook too old for the new version. I looked around the internet and Stack Overflow and cannot find what I want or I cannot get snippets to work. I would like to add a menu item to the menu items that appear when longpressing a word in a UITextView. I want it to say "Wiki" and when this is pressed, it will link to the wikipedia page of the word that is selected. It may be through Safari or should I do this within the app with a webview? I found: