nlp

Split Sentences at Bullets and Numbering?

坚强是说给别人听的谎言 提交于 2020-06-09 03:42:24
问题 I am trying to input text into my word processor to be split into sentences first and then into words. An example paragraph: When the blow was repeated,together with an admonition in childish sentences, he turned over upon his back, and held his paws in a peculiar manner. 1) This a numbered sentence 2) This is the second numbered sentence At the same time with his ears and his eyes he offered a small prayer to the child. Below are the examples - This an example of bullet point sentence - This

Split Sentences at Bullets and Numbering?

℡╲_俬逩灬. 提交于 2020-06-09 03:42:08
问题 I am trying to input text into my word processor to be split into sentences first and then into words. An example paragraph: When the blow was repeated,together with an admonition in childish sentences, he turned over upon his back, and held his paws in a peculiar manner. 1) This a numbered sentence 2) This is the second numbered sentence At the same time with his ears and his eyes he offered a small prayer to the child. Below are the examples - This an example of bullet point sentence - This

Bert fine-tuned for semantic similarity

夙愿已清 提交于 2020-06-08 12:31:33
问题 I would like to apply fine-tuning Bert to calculate semantic similarity between sentences. I search a lot websites, but I almost not found downstream about this. I just found STS benchmark . I wonder if I can use STS benchmark dataset to train a fine-tuning bert model, and apply it to my task. Is it reasonable? As I know, there are a lot method to calculate similarity including cosine similarity, pearson correlation, manhattan distance, etc. How choose for semantic similarity? 回答1: As a

Bert fine-tuned for semantic similarity

南笙酒味 提交于 2020-06-08 12:28:11
问题 I would like to apply fine-tuning Bert to calculate semantic similarity between sentences. I search a lot websites, but I almost not found downstream about this. I just found STS benchmark . I wonder if I can use STS benchmark dataset to train a fine-tuning bert model, and apply it to my task. Is it reasonable? As I know, there are a lot method to calculate similarity including cosine similarity, pearson correlation, manhattan distance, etc. How choose for semantic similarity? 回答1: As a

Spacy - lemmatization on pronouns gives some erronous output

元气小坏坏 提交于 2020-06-01 06:01:05
问题 lemmatization on pronouns via [token.lemma_ for token in doc] gives lemmatized word for pronouns as -PRON- , is this a bug? 回答1: No, this is in fact intended behaviour. See the documentation here: Unlike verbs and common nouns, there's no clear base form of a personal pronoun. Should the lemma of "me" be "I", or should we normalize person as well, giving "it" — or maybe "he"? spaCy's solution is to introduce a novel symbol, -PRON- , which is used as the lemma for all personal pronouns. It

Patterns with multi-terms entries in the IN attribute

我的梦境 提交于 2020-06-01 05:36:10
问题 I am extending a spaCy model using rules. While looking through the documentation, I noticed the IN attribute, which is used to map patterns to a dictionary of properties. This is great however it only works on single tokens. For example, this pattern: {"label":"EXAMPLE","pattern":[{"LOWER": {"IN": ["such as", "like", "for example"]}}]} will only work with the term like but not the others. What is the best way to achieve the same result for multi-terms attributes? 回答1: It depends on how

How to initialize second glove model with solution from first?

一世执手 提交于 2020-05-30 03:38:38
问题 I am trying to implement one of the solutions to the question about How to align two GloVe models in text2vec?. I don't understand what are the proper values for input at GlobalVectors$new(..., init = list(w_i, w_j) . How do I ensure the values for w_i and w_j are correct? Here's a minimal reproducible example. First, prepare some corpora to compare, taken from the quanteda tutorial. I am using dfm_match(all_words) to try and ensure all words are present in each set, but this doesn't seem to

NLTK was unable to find the java file! for Stanford POS Tagger

徘徊边缘 提交于 2020-05-26 05:06:19
问题 I have been stuck trying to get the Stanford POS Tagger to work for a while. From an old SO post I found the following (slightly modified) code: stanford_dir = 'C:/Users/.../stanford-postagger-2017-06-09/' from nltk.tag import StanfordPOSTagger #from nltk.tag.stanford import StanfordPOSTagger # I tried it both ways from nltk import word_tokenize # Add the jar and model via their path (instead of setting environment variables): jar = stanford_dir + 'stanford-postagger.jar' model = stanford_dir

How to access Dialogflow V2 API from a webpage?

烂漫一生 提交于 2020-05-25 09:31:13
问题 I have a webpage where I want to use dialogflow chatbot. This is a custom chat window, so I don't want to use one click integration. I am able to access the chat agent V1 API using javascript/ajax (by passing client access token in the request header). But I don't know how to do it in V2 API . The dialogflow documentation is not clear to me(I have setup Authentication by referring this link. I don't know how to proceed further). I'm not familiar with Google cloud either. So a working sample

How to access Dialogflow V2 API from a webpage?

夙愿已清 提交于 2020-05-25 09:30:48
问题 I have a webpage where I want to use dialogflow chatbot. This is a custom chat window, so I don't want to use one click integration. I am able to access the chat agent V1 API using javascript/ajax (by passing client access token in the request header). But I don't know how to do it in V2 API . The dialogflow documentation is not clear to me(I have setup Authentication by referring this link. I don't know how to proceed further). I'm not familiar with Google cloud either. So a working sample