How to determine the (natural) language of a document?

前端 未结 11 1631
情话喂你
情话喂你 2020-12-24 07:16

I have a set of documents in two languages: English and German. There is no usable meta information about these documents, a program can look at the content only. Based on t

11条回答
  •  轻奢々
    轻奢々 (楼主)
    2020-12-24 08:19

    Isn't the problem several orders of magnitude easier if you've only got two languages (English and German) to choose from? In this case your approach of a list of stop words might be good enough.

    Obviously you'd need to consider a rewrite if you added more languages to your list.

提交回复
热议问题