How to detect language

后端 未结 7 1866
你的背包
你的背包 2020-12-28 21:18

Are there any good, open source engines out there for detecting what language a text is in, perhaps with a probability metric? One that I can run locally and doesn\'t query

7条回答
  •  梦毁少年i
    2020-12-28 21:55

    I don't think you need anything very sophisticated - for example to detect if a document is in English, with a pretty high level of certainty, simply test if it contains the N most common English words - something like:

    "the a an is to are in on in it"
    

    If it contains all of those, I would say it is almost definitely English.

提交回复
热议问题