bleu

Text Summarization Evaluation - BLEU vs ROUGE

穿精又带淫゛_ 提交于 2020-01-11 16:37:14
问题 With the results of two different summary systems (sys1 and sys2) and the same reference summaries, I evaluated them with both BLEU and ROUGE. The problem is: All ROUGE scores of sys1 was higher than sys2 (ROUGE-1, ROUGE-2, ROUGE-3, ROUGE-4, ROUGE-L, ROUGE-SU4, ...) but the BLEU score of sys1 was less than the BLEU score of sys2 (quite much). So my question is: Both ROUGE and BLEU are based on n-gram to measure the similar between the summaries of systems and the summaries of human. So why

NLTK: corpus-level bleu vs sentence-level BLEU score

梦想与她 提交于 2019-12-18 02:42:59
问题 I have imported nltk in python to calculate BLEU Score on Ubuntu. I understand how sentence-level BLEU score works, but I don't understand how corpus-level BLEU score work. Below is my code for corpus-level BLEU score: import nltk hypothesis = ['This', 'is', 'cat'] reference = ['This', 'is', 'a', 'cat'] BLEUscore = nltk.translate.bleu_score.corpus_bleu([reference], [hypothesis], weights = [1]) print(BLEUscore) For some reason, the bleu score is 0 for the above code. I was expecting a corpus

NLTK: corpus-level bleu vs sentence-level BLEU score

拈花ヽ惹草 提交于 2019-11-28 23:40:48
I have imported nltk in python to calculate BLEU Score on Ubuntu. I understand how sentence-level BLEU score works, but I don't understand how corpus-level BLEU score work. Below is my code for corpus-level BLEU score: import nltk hypothesis = ['This', 'is', 'cat'] reference = ['This', 'is', 'a', 'cat'] BLEUscore = nltk.translate.bleu_score.corpus_bleu([reference], [hypothesis], weights = [1]) print(BLEUscore) For some reason, the bleu score is 0 for the above code. I was expecting a corpus-level BLEU score of at least 0.5. Here is my code for sentence-level BLEU score import nltk hypothesis =