Converting Readability formula into python function

六眼飞鱼酱① 提交于 2019-12-02 09:54:38

BTW, there's the textstat library.

from textstat.textstat import textstat
from nltk.corpus import gutenberg

for filename in gutenberg.fileids():
    print(filename, textstat.flesch_reading_ease(filename))

If you're bent on coding up your own, first you've to

  • decide if a punctuation is a word
  • define how to count no. of syllables in the word.

If punctuation is a word and syllables is counted by the regex in your question, then:

import re
from itertools import chain
from nltk.corpus import gutenberg

def num_syllables_per_word(word):
    return len(re.findall('[aeiou]+[^aeiou]+', word))

for filename in gutenberg.fileids():
    sents = gutenberg.sents(filename)
    words = gutenberg.words(filename) # i.e. list(chain(*sents))
    num_sents = len(sents)
    num_words = len(words)
    num_syllables = sum(num_syllables_per_word(w) for w in words)
    score = 206.835 - 1.015 * (num_words / num_sents) - 84.6 * (num_syllables / num_words)
    print(filename, score)