How to split a string into words. Ex: “stringintowords” -> “String Into Words”?

前端 未结 13 1236
粉色の甜心
粉色の甜心 2020-11-29 20:18

What is the right way to split a string into words ? (string doesn\'t contain any spaces or punctuation marks)

For example: \"stringintowords\" -> \"String Into Word

13条回答
  •  心在旅途
    2020-11-29 20:38

    There should be a fair bit in the academic literature on this. The key words you want to search for are word segmentation. This paper looks promising, for example.

    In general, you'll probably want to learn about markov models and the viterbi algorithm. The latter is a dynamic programming algorithm that may allow you to find plausible segmentations for a string without exhaustively testing every possible segmentation. The essential insight here is that if you have n possible segmentations for the first m characters, and you only want to find the most likely segmentation, you don't need to evaluate every one of these against subsequent characters - you only need to continue evaluating the most likely one.

提交回复
热议问题