how to check if a string looks randomized, or human generated and pronouncable?

前端 未结 10 1620
旧巷少年郎
旧巷少年郎 2020-12-13 03:55

For the purpose of identifying [possible] bot-generated usernames.

Suppose you have a username like \"bilbomoothof\" .. it may be nonsense, but it still contains pro

10条回答
  •  难免孤独
    2020-12-13 04:17

    Look up n-gram analysis. It is successfully used to automatically detect text language and works surprisingly well even on very short texts.

    The online demo (no longer online) recognized 'bilbomoothof' as English and 'sdfgbhm342r3f' as Nepali. It probably always returns the best match, even if it's a very poor one. I think you could train it to discern between 'pronounceable' and 'random'.

提交回复
热议问题