对于单条短文本数据的分类(噪声数据过滤)

匿名 (未验证) 提交于 2019-12-03 00:37:01

VERB - verbs (all tenses and modes) 动词 NOUN - nouns (common and proper)  名词 PRON - pronouns   代词(人称代词) ADJ - adjectives   形容词 ADV - adverbs      副词 ADP - adpositions (prepositions and postpositions)   介词 CONJ - conjunctions         连接词 DET - determiners        限定词 NUM - cardinal numbers      数字 PRT - particles or other function words   小品词或结构词、虚词 X - other: foreign words, typos, abbreviations   缩略词等 . - punctuation     标点

无用推文

[(('NOUN', 'NOUN'), 2575), (('PRON', 'VERB'), 1498), (('NOUN', 'VERB'), 1268), (('DET', 'NOUN'), 1047), (('VERB', 'VERB'), 981), (('ADJ', 'NOUN'), 873), (('VERB', 'PRON'), 853), (('NOUN', 'ADP'), 765), (('VERB', 'NOUN'), 760), (('VERB', 'ADV'), 626)]

有用推文

[(('NOUN', 'NOUN'), 3042), (('ADP', 'NOUN'), 1350), (('NOUN', 'VERB'), 1310), (('NOUN', 'ADP'), 945), (('VERB', 'ADP'), 669), (('VERB', 'NOUN'), 462), (('DET', 'NOUN'), 427), (('NUM', 'NOUN'), 413), (('ADJ', 'NOUN'), 378), (('ADP', 'DET'), 239)]

跟地点的情况:

RT @theheraldsun: AFP officer dies in Melbourne CBD shooting >> https://t.co/afnSM4TXkS https://t.co/IAjTSx0zMr

跟动作的情况:

RT @KMPHFOX26: #BREAKING KCSO: 1 dead, others hurt in shooting at a Bakersfield's casino. https://t.co/TCkiKdV1cp

跟时间的情况:

RT @Asipatravana: So that's 3 bomb explosions, 1 attempted kidnapping, & 2 shootings in 2 days in Sweden. And we are expected to beli… 

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!