How to determine the (natural) language of a document?

前端未结

关注

 11  1645

情话喂你 2020-12-24 07:16

I have a set of documents in two languages: English and German. There is no usable meta information about these documents, a program can look at the content only. Based on t

11条回答

轻奢々 (楼主)

2020-12-24 08:19

Isn't the problem several orders of magnitude easier if you've only got two languages (English and German) to choose from? In this case your approach of a list of stop words might be good enough.

Obviously you'd need to consider a rewrite if you added more languages to your list.

0 讨论(0)

查看其它11个回答
发布评论:

提交评论
- 加载中...