How to retrieve Wiktionary word content?

前端 未结 9 1632
花落未央
花落未央 2020-11-28 18:27

How may Wiktionary\'s API be used to determine whether or not a word exists?

9条回答
  •  迷失自我
    2020-11-28 18:36

    As mentioned earlier, the problem with this approach is that Wiktionary provides the information about all the words of all the languages. So the approach to check if a page exists using Wikipedia API won't work because there're a lot of pages for non-English words. To overcome this, you need to parse each page to figure out if there's a section describing English word. Parsing wikitext isn't a trivial task, though in your case it's not that bad. To cover almost all the cases you need to just check if the wikitext contains English heading. Depending on the programming language you use, you can find some tools to build AST from wikitext. This will cover most of the cases, but not all of them because Wiktionary includes some common misspellings.

    As an alternative, you could try using Lingua Robot or something similar. Lingua Robot parses the Wiktionary content and provides it as a REST API. Non-empty response means that the word exists. Please note that, as opposed to Wiktionary, the API itself doesn't include any misspellings (at least at the moment of writing this answer). Please also note that the Wiktionary contains not only the words, but multi-word expressions.

提交回复
热议问题