Wikipedia : Java library to remove wikipedia text markup removal

前端 未结 5 541
灰色年华
灰色年华 2020-12-19 04:10

I downloaded wikipedia dump and now want to remove the wikipedia markup in the contents of each page. I tried writing regular expressions but they are too many to handle. I

5条回答
  •  执念已碎
    2020-12-19 04:51

    Mylyn WikiText can convert various Wiki syntaxes into HTML and other formats. It also supports MediaWiki syntax, which is what Wikipedia uses. Although Mylyn WikiText is primarily an Eclipse plugin, it is also available as standalone library.

提交回复
热议问题