text-chunking

How to extract chunks from BIO chunked sentences? - python

久未见 提交于 2020-01-02 02:06:09
问题 Give an input sentence, that has BIO chunk tags: [('What', 'B-NP'), ('is', 'B-VP'), ('the', 'B-NP'), ('airspeed', 'I-NP'), ('of', 'B-PP'), ('an', 'B-NP'), ('unladen', 'I-NP'), ('swallow', 'I-NP'), ('?', 'O')] I would need to extract the relevant phrases out, e.g. if I want to extract 'NP' , I would need to extract the fragments of tuples that contains B-NP and I-NP . [out]: [('What', '0'), ('the airspeed', '2-3'), ('an unladen swallow', '5-6-7')] (Note: the numbers in the extract tuples

How to extract chunks from BIO chunked sentences? - python

限于喜欢 提交于 2019-12-05 03:58:10
Give an input sentence, that has BIO chunk tags : [('What', 'B-NP'), ('is', 'B-VP'), ('the', 'B-NP'), ('airspeed', 'I-NP'), ('of', 'B-PP'), ('an', 'B-NP'), ('unladen', 'I-NP'), ('swallow', 'I-NP'), ('?', 'O')] I would need to extract the relevant phrases out, e.g. if I want to extract 'NP' , I would need to extract the fragments of tuples that contains B-NP and I-NP . [out]: [('What', '0'), ('the airspeed', '2-3'), ('an unladen swallow', '5-6-7')] (Note: the numbers in the extract tuples represent the token index.) I have tried extracting it using the following code: def extract_chunks(tagged

Python (NLTK) - more efficient way to extract noun phrases?

二次信任 提交于 2019-11-29 10:38:14
I've got a machine learning task involving a large amount of text data. I want to identify, and extract, noun-phrases in the training text so I can use them for feature construction later on in the pipeline. I've extracted the type of noun-phrases I wanted from text but I'm fairly new to NLTK, so I approached this problem in a way where I can break down each step in list comprehensions like you can see below. But my real question is, am I reinventing the wheel here? Is there a faster way to do this that I'm not seeing? import nltk import pandas as pd myData = pd.read_excel("\User\train_.xlsx")

How to use nltk regex pattern to extract a specific phrase chunk?

本秂侑毒 提交于 2019-11-28 21:42:53
I have written the following regex to tag certain phrases pattern pattern = """ P2: {<JJ>+ <RB>? <JJ>* <NN>+ <VB>* <JJ>*} P1: {<JJ>? <NN>+ <CC>? <NN>* <VB>? <RB>* <JJ>+} P3: {<NP1><IN><NP2>} P4: {<NP2><IN><NP1>} """ This pattern would correctly tag a phrase such as: a = 'The pizza was good but pasta was bad' and give the desired output with 2 phrases: pizza was good pasta was bad However, if my sentence is something like: a = 'The pizza was awesome and brilliant' matches only the phrase: 'pizza was awesome' instead of the desired: 'pizza was awesome and brilliant' How do I incorporate the

Python (NLTK) - more efficient way to extract noun phrases?

时光总嘲笑我的痴心妄想 提交于 2019-11-28 03:49:21
问题 I've got a machine learning task involving a large amount of text data. I want to identify, and extract, noun-phrases in the training text so I can use them for feature construction later on in the pipeline. I've extracted the type of noun-phrases I wanted from text but I'm fairly new to NLTK, so I approached this problem in a way where I can break down each step in list comprehensions like you can see below. But my real question is, am I reinventing the wheel here? Is there a faster way to

How to use nltk regex pattern to extract a specific phrase chunk?

廉价感情. 提交于 2019-11-27 14:17:13
问题 I have written the following regex to tag certain phrases pattern pattern = """ P2: {<JJ>+ <RB>? <JJ>* <NN>+ <VB>* <JJ>*} P1: {<JJ>? <NN>+ <CC>? <NN>* <VB>? <RB>* <JJ>+} P3: {<NP1><IN><NP2>} P4: {<NP2><IN><NP1>} """ This pattern would correctly tag a phrase such as: a = 'The pizza was good but pasta was bad' and give the desired output with 2 phrases: pizza was good pasta was bad However, if my sentence is something like: a = 'The pizza was awesome and brilliant' matches only the phrase: