Multilingual spell checking with language detection

随声附和 提交于 2019-12-05 05:42:20

You can use API (Google & Yandex) for spell check and language detection - but this option is not very scalable I think.

Other option is to use free lucene tools for spellchecking http://wiki.apache.org/lucene-java/SpellChecker, but you have to index some corpra first - Wikipedia is good choice. LD can be archived by http://textcat.sourceforge.net/

With the Languagetool http:/www.languagetool.org Library you can select the languages you need and have the content checked against your set of languages. E.g. for a French/English website you'd check the text against English and French. Obviously there will be more errors when you check against the wrong language.

Example:

If you e.g. check the french text from http://fr.wikipedia.org/wiki/Charte_de_la_langue_fran%C3%A7aise:

La Charte de la langue française (communément appelée la loi 1011) est 
une loi définissant les droits linguistiques de tous les citoyens du 
Québec et faisant du français la langue officielle du Québec.

on http://www.languagetool.org it will show no errors for French and more than 20 errors for English/GB.

The corresponding english text:

The Charter of the French Language (French: La charte de la langue française), also 
known as Bill 101 (Law 101 or French: Loi 101), is a law in the province of Quebec 
in Canada defining French, the language of the majority of the population, as the 
official language of Quebec and framing fundamental language rights. It is the central
legislative piece in Quebec's language policy.

will show 4 errors for English/GB (due to the French citation) and more than 20 errors when you check it agains the French language.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!