enWiki dump python function

最后都变了- 提交于 2020-02-08 02:31:32

问题


I am looking to create a function that goes through the xml file article and then,

for each article:

if it contains the keywords, moral or ethic (wildcard search):

move it to another folder

else:

ignore

I have tried a few things and had a look round but really struggling (not even sure if you can do wildcard searching) as I have only just started using Python, any help would be much appreciated.

Here is an example of the xml code below...

<page>
<title>Anarchism</title>
<ns>0</ns>
<id>12</id>
<revision>
  <id>932020697</id>
  <parentid>932020422</parentid>
  <timestamp>2019-12-22T22:28:39Z</timestamp>
  <contributor>
    <username>El C</username>
    <id>92203</id>
  </contributor>
  <minor />
  <comment>Reverted edits by [[Special:Contribs/98.181.248.149|98.181.248.149]] ([[User talk:98.181.248.149|talk]]) to last version by InternetArchiveBot</comment>
  <model>wikitext</model>
  <format>text/x-wiki</format>
  <text xml:space="preserve">
#whole text from article#
</text>
      <sha1>3cwj5oszq9qyabe0sy3sts0tnhysbvm</sha1>
    </revision>
  </page>
#next article#
  <page>

来源:https://stackoverflow.com/questions/60073966/enwiki-dump-python-function

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!