parse text into valid sentence

穿精又带淫゛_ 提交于 2019-12-25 14:29:14

问题


I have a doubt about how to parse any text into valid sentence.

Suppose a text is given iamjhamb and parse into i am jhamb

My approach: I solved this using Dynamic programmnig, 
             Make an array T[], where T[i] shows string from 0 to i made any valid setence or not
             formula is T[i] = 1 iff T[j] = 1 and substring(j+1, i) is a word in dictionary for all
             j < i.

But this approach is not totally correct, it gives all possible words form from this text, as this is not the demand of this questioin. So, please help me to correct this approach, or suggest any other good approach.

I have one more doubt, i searched a lot on net about Suffix array, but I didn't get any good tutorial. So make me understand that concept, or suggest any good link. Thanks in advance.


回答1:


This problem is known as the word segmentation problem in natural language processing. While this problem rarely arises for English, it is quite common for Arabic or Chinese. You could review the literature on the subject and consider adapting one of the methods to your case.

As for your algorithm, the simplest thing to do would be to enumerate the possible segmentations it produces and select one using a language model. I think a bigram model might suffice for simple sentences.

Suffix tree would allow you to find the possible segmentations more efficiently, but would not help identifying the most likely one, unless you go for a language model based on suffix trees.




回答2:


Have you tried constructing a trie for the String? Read about them here. It will work except for cases where there are multiple choices to choose from. Example: aneat can be a neat or an eat.



来源:https://stackoverflow.com/questions/12166250/parse-text-into-valid-sentence

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!