text-analysis

Customizing the Named Entity Recogntition model in Azure ML

谁都会走 提交于 2021-01-28 02:17:53
问题 Can we customize the Named Entity Recognition (NER) model in Azure ML Studio with a separate training dataset? What I want to do is to find out non-English names from a text. (Training dataset includes the set of names that going to use for training) 回答1: Unfortunately, this module's ability to perform NER with a custom set of entities is planned for the future, but not currently available. If you're familiar with Python and willing to put in the extra footwork, you might consider using the

Calculate correlation coefficient between words?

孤街浪徒 提交于 2021-01-27 06:32:09
问题 For a text analysis program, I would like to analyze the co-occurrence of certain words in a text. For example, I would like to see that e.g. the words "Barack" and "Obama" appear more often together (i.e. have a positive correlation) than others. This does not seem to be that difficult. However, to be honest, I only know how to calculate the correlation between two numbers, but not between two words in a text. How can I best approach this problem? How can I calculate the correlation between

How do I solve the following error?Input must be a character vector of any length or a list of character vectors, each of which has a length of 1.

本秂侑毒 提交于 2020-02-20 05:06:28
问题 I am working on a R project. The data set I used is available at the following link https://www.kaggle.com/ranjitha1/hotel-reviews-city-chennai/data The code I have used is. df1 = read.csv("chennai.csv", header = TRUE) library(tidytext) tidy_books <- df1 %>% unnest_tokens(word,Review_Text) Here Review_Text is the text column. Yet, I get the following error. Error in check_input(x) : Input must be a character vector of any length or a list of character vectors, each of which has a length of 1.

How do I solve the following error?Input must be a character vector of any length or a list of character vectors, each of which has a length of 1.

穿精又带淫゛_ 提交于 2020-02-20 04:54:34
问题 I am working on a R project. The data set I used is available at the following link https://www.kaggle.com/ranjitha1/hotel-reviews-city-chennai/data The code I have used is. df1 = read.csv("chennai.csv", header = TRUE) library(tidytext) tidy_books <- df1 %>% unnest_tokens(word,Review_Text) Here Review_Text is the text column. Yet, I get the following error. Error in check_input(x) : Input must be a character vector of any length or a list of character vectors, each of which has a length of 1.

How do I solve the following error?Input must be a character vector of any length or a list of character vectors, each of which has a length of 1.

╄→гoц情女王★ 提交于 2020-02-20 04:51:24
问题 I am working on a R project. The data set I used is available at the following link https://www.kaggle.com/ranjitha1/hotel-reviews-city-chennai/data The code I have used is. df1 = read.csv("chennai.csv", header = TRUE) library(tidytext) tidy_books <- df1 %>% unnest_tokens(word,Review_Text) Here Review_Text is the text column. Yet, I get the following error. Error in check_input(x) : Input must be a character vector of any length or a list of character vectors, each of which has a length of 1.

Very simple text classification by machine learning? [duplicate]

匆匆过客 提交于 2020-01-30 13:41:31
问题 This question already has answers here : Closed 7 years ago . Possible Duplicate: Text Classification into Categories I am currently working on a solution to get the type of food served in a database with 10k restaurants based on their description. I'm using lists of keywords to decide which kind of food is being served. I read a little bit about machine learning but I have no practical experience with it at all. Can anyone explain to me if/why it would a be better solution to a simple

Text analysis-Unable to write output of Python program in csv or xls file

耗尽温柔 提交于 2020-01-15 10:59:12
问题 Hi I am trying to do a sentiment analysis using Naive Bayes classifier in python 2.x. It reads the sentiment using a txt file and then gives output as positive or negative based on the sample txt file sentiments. I want the output the same form as input e.g. I have a text file of lets sat 1000 raw sentiments and I want the output to show positive or negative against each sentiment. Please help. Below is the code i am using import math import string def Naive_Bayes_Classifier(positive,

How to combine TFIDF features with other features

霸气de小男生 提交于 2020-01-12 04:44:07
问题 I have a classic NLP problem, I have to classify a news as fake or real. I have created two sets of features: A) Bigram Term Frequency-Inverse Document Frequency B) Approximately 20 Features associated to each document obtained using pattern.en (https://www.clips.uantwerpen.be/pages/pattern-en) as subjectivity of the text, polarity, #stopwords, #verbs, #subject, relations grammaticals etc ... Which is the best way to combine the TFIDF features with the other features for a single prediction?

Use brain.js neural network to do text analysis

让人想犯罪 __ 提交于 2020-01-12 02:19:27
问题 I'm trying to do some text analysis to determine if a given string is... talking about politics. I'm thinking I could create a neural network where the input is either a string or a list of words (ordering might matter?) and the output is whether the string is about politics. However the brain.js library only takes inputs of a number between 0 and 1 or an array of numbers between 0 and 1. How can I coerce my data in such a way that I can achieve the task? 回答1: new brain.recurrent.LSTM(); this