Counting syllables

流过昼夜 提交于 2019-12-04 17:38:30

问题


I'm looking to assign some different readability scores to text in R such as the Flesh Kincaid.

Does anyone know of a way to segment words into syllables using R? I don't necessarily need the syllable segments themselves but a count.

so for instance:

x <- c('dog', 'cat', 'pony', 'cracker', 'shoe', 'Popsicle')

would yield: 1, 1, 2, 2, 1, 3

Each number corresponding the the number of syllables in the word.


回答1:


Some tools for NLP are available here:

http://cran.r-project.org/web/views/NaturalLanguageProcessing.html

The task is non-trivial though. More hints (including an algorithm you could implement) here:

Detecting syllables in a word




回答2:


qdap version 1.1.0 does this task:

library(qdap)
x <- c('dog', 'cat', 'pony', 'cracker', 'shoe', 'Popsicle')
syllable_sum(x)

## [1] 1 1 2 2 1 3



回答3:


gsk3 is correct: if you want a correct solution, it is non-trivial.

For example, you have to watch out for strange things like silent e at the end of a word (eg pane), or know when it's not silent, as in finale.

However, if you just want a quick-and-dirty approximation, this will do it:

> nchar( gsub( "[^X]", "", gsub( "[aeiouy]+", "X", tolower( x ))))
[1] 1 1 2 2 1 3

To understand how the parts work, just strip away the function calls from the outside in, starting with nchar and then gsub, etc... ...until the expression makes sense to you.

But my guess is, considering a fight between R's power vs the profusion of exceptions in the English language, you could get a decent answer (maybe 99% right?) parsing through normal text, without a lot of work - heck, the simple parser above may get 90%+ right. With a little more work, you could deal with silent e's if you like.

It all depends on your application - whether this is good enough or you need something more accurate.




回答4:


The koRpus package will help you out immensley, but it's a little difficult to work with.

stopifnot(require(koRpus))
tokens <- tokenize(text, format="obj", lang='en')
flesch.kincaid(tokens)


来源:https://stackoverflow.com/questions/8553240/counting-syllables

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!