PHP + Wikipedia: Get content from the first paragraph in a Wikipedia article?

只谈情不闲聊 提交于 2019-12-07 19:36:49

问题


I’m trying to use Wikipedia’s API (api.php) to get the content of a Wikipedia article provided by a link (like: http://en.wikipedia.org/wiki/Stackoverflow). And what I want is to get the first paragraph (which in the example of the Stackoverflow wiki article is: Stack Overflow is a website part of the Stack Exchange network[2][3] featuring questions and answers on a wide range of topics in computer programming.[4][5][6]).

I’m going to do some data manipulation with it.

I’ve tried with the API url: http://en.wikipedia.org/w/api.php?action=parse&page=Stackoverflow&format=xml but it gives me some kind of error. It outputs:

<api>
<parse displaytitle="Stackoverflow" revid="289948401">
<text xml:space="preserve">
<ol> <li>REDIRECT <a href="/wiki/Stack_Overflow" title="Stack Overflow">Stack Overflow</a></li> </ol> <!-- NewPP limit report Preprocessor node count: 1/1000000 Post-expand include size: 0/2048000 bytes Template argument size: 0/2048000 bytes Expensive parser function count: 0/500 --> <!-- Saved in parser cache with key enwiki:pcache:idhash:21772484-0!*!0!!*!* and timestamp 20110525165333 -->
</text>
<langlinks/>
<categories/>
<links>
<pl ns="0" exists="" xml:space="preserve">Stack Overflow</pl>
</links>
<templates/>
<images/>
<externallinks/>
<sections/>
</parse>
</api>

I found this snippet of code that I’ve tried

$doc = new DOMDocument();
$doc->loadHTML($wikiPage);
$xpath = new DOMXpath($doc);
$nlPNodes = $xpath->query('//div[@id="bodyContent"]/p');
$nFirstP = $nlPNodes->item(0);
$sFirstP = $doc->saveXML($nFirstP);
echo $sFirstP; 

but I can’t get the HTML content in the variable $wikiPage.

I do not know if this is the best or most optimal way to do it so please feel free to comment on that and otherwise any suggestion or solutions would be very appreciated.

Thank you
- Mestika


回答1:


You're getting the contents of a redirect page. Replace 'Stackoverflow' with 'Stack_Overflow' and it should work.

The API does have support for an &redirects option, which will resolve redirects for you.



来源:https://stackoverflow.com/questions/6128168/php-wikipedia-get-content-from-the-first-paragraph-in-a-wikipedia-article

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!