nlp

Testing text classification ML model with new data fails

旧时模样 提交于 2020-12-23 18:06:03
问题 I have built a machine learning model to classify emails as spams or not. Now i want to test my own email and see the result. So i wrote the following code to classify the new email: message = """Subject: Hello this is from google security team we want to recover your password. Please contact us as soon as possible""" message = pd.Series([message,]) transformed_message = CountVectorizer(analyzer=process_text).fit_transform(message) proba = model.predict_proba(transformed_message)[0] Knowing

Testing text classification ML model with new data fails

北城以北 提交于 2020-12-23 17:59:31
问题 I have built a machine learning model to classify emails as spams or not. Now i want to test my own email and see the result. So i wrote the following code to classify the new email: message = """Subject: Hello this is from google security team we want to recover your password. Please contact us as soon as possible""" message = pd.Series([message,]) transformed_message = CountVectorizer(analyzer=process_text).fit_transform(message) proba = model.predict_proba(transformed_message)[0] Knowing

Testing text classification ML model with new data fails

落爺英雄遲暮 提交于 2020-12-23 17:57:54
问题 I have built a machine learning model to classify emails as spams or not. Now i want to test my own email and see the result. So i wrote the following code to classify the new email: message = """Subject: Hello this is from google security team we want to recover your password. Please contact us as soon as possible""" message = pd.Series([message,]) transformed_message = CountVectorizer(analyzer=process_text).fit_transform(message) proba = model.predict_proba(transformed_message)[0] Knowing

Testing text classification ML model with new data fails

守給你的承諾、 提交于 2020-12-23 17:53:50
问题 I have built a machine learning model to classify emails as spams or not. Now i want to test my own email and see the result. So i wrote the following code to classify the new email: message = """Subject: Hello this is from google security team we want to recover your password. Please contact us as soon as possible""" message = pd.Series([message,]) transformed_message = CountVectorizer(analyzer=process_text).fit_transform(message) proba = model.predict_proba(transformed_message)[0] Knowing

gensim most_similar with positive and negative, how does it work?

自作多情 提交于 2020-12-15 06:49:10
问题 I was reading this answer That says about Gensim most_similar : it performs vector arithmetic: adding the positive vectors, subtracting the negative, then from that resulting position, listing the known-vectors closest to that angle. But when I tested it, that is not the case. I trained a Word2Vec with Gensim "text8" dataset and tested these two: model.most_similar(positive=['woman', 'king'], negative=['man']) >>> [('queen', 0.7131118178367615), ('prince', 0.6359186768531799),...] model.wv

gensim most_similar with positive and negative, how does it work?

限于喜欢 提交于 2020-12-15 06:47:01
问题 I was reading this answer That says about Gensim most_similar : it performs vector arithmetic: adding the positive vectors, subtracting the negative, then from that resulting position, listing the known-vectors closest to that angle. But when I tested it, that is not the case. I trained a Word2Vec with Gensim "text8" dataset and tested these two: model.most_similar(positive=['woman', 'king'], negative=['man']) >>> [('queen', 0.7131118178367615), ('prince', 0.6359186768531799),...] model.wv

Sentence structure analysis

限于喜欢 提交于 2020-12-15 01:41:40
问题 I am trying to look at the structure similarity of sentences, specifically to the position of verbs, adj, nouns. For instance, I have three (or more) sentences which look likes as follows: I ate an apple pie, yesterday. I ate an orange, yesterday. I eat a lemon, today. All of them starts with a pronoun (I) followed by a verb (ate/eat) and a noun (apple pie, orange, lemon) and, finally, an adverb (yesterday/tomorrow). I would like to know if there is a way to identify the structure, i.e.

how to view tf-idf score against each word

我是研究僧i 提交于 2020-12-13 05:56:40
问题 I was trying to know the tf-idf scores of each word in my document. However, it only returns values in the matrix but I see a specific type of representation of tf-idf scores against each word. I have used processed and the code works however I want to change the way it is presented: code: from sklearn.feature_extraction.text import CountVectorizer from sklearn.feature_extraction.text import TfidfTransformer bow_transformer = CountVectorizer(analyzer=text_process).fit(df["comments"].head())

Extract Main- and Subclauses from German Sentence with SpaCy

房东的猫 提交于 2020-12-12 06:28:18
问题 In German, how can I extract the main- and subclauses (aka "subordinate clauses", "dependent clauses") from a sentence with SpaCy? I know how to use SpaCy's tokenizer, part-of-speech tagging and dependency parser, but I cannot figure out how to represent the grammatical rules of German using the information SpaCy can extract. 回答1: The problem can be divided into two tasks: 1. Splitting the sentence in its constituting clauses and 2. Identifying which of the clauses is a main clause and which

Keyword in context (kwic) for skipgrams?

蓝咒 提交于 2020-12-12 02:07:06
问题 I do keyword in context analysis with quanteda for ngrams and tokens and it works well. I now want to do it for skipgrams, capture the context of "barriers to entry" but also "barriers to [...] [and] entry. The following code a kwic object which is empty but I don't know what I did wrong. dcc.corpus refers to the text document. I also used the tokenized version but nothing changes. The result is: "kwic object with 0 rows" x <- tokens("barriers entry") ntoken_test <- tokens_ngrams(x, n = 2,