classification

Simple binary logistic regression using MATLAB

元气小坏坏 提交于 2021-02-06 11:00:02
问题 I'm working on doing a logistic regression using MATLAB for a simple classification problem. My covariate is one continuous variable ranging between 0 and 1, while my categorical response is a binary variable of 0 (incorrect) or 1 (correct). I'm looking to run a logistic regression to establish a predictor that would output the probability of some input observation (e.g. the continuous variable as described above) being correct or incorrect. Although this is a fairly simple scenario, I'm

Simple binary logistic regression using MATLAB

こ雲淡風輕ζ 提交于 2021-02-06 10:59:44
问题 I'm working on doing a logistic regression using MATLAB for a simple classification problem. My covariate is one continuous variable ranging between 0 and 1, while my categorical response is a binary variable of 0 (incorrect) or 1 (correct). I'm looking to run a logistic regression to establish a predictor that would output the probability of some input observation (e.g. the continuous variable as described above) being correct or incorrect. Although this is a fairly simple scenario, I'm

Twitter Sentiments Analysis useful features

◇◆丶佛笑我妖孽 提交于 2021-02-05 20:39:11
问题 I'm trying to implement Sentiments Analysis functionality and looking for useful features which can be extracted from tweet messages.The features which I have in my mind for now are: Sentiment words Emotion icons Exclamation marks Negation words Intensity words(very,really etc) Is there any other useful features for this task? My goal is not only detect that tweet is positive or negative but also I need to detect level of positivity or negativity(let say in a scale from 0 to 100). Any inputs

Getting the accuracy for multi-label prediction in scikit-learn

南笙酒味 提交于 2021-02-05 18:52:13
问题 In a multilabel classification setting, sklearn.metrics.accuracy_score only computes the subset accuracy (3): i.e. the set of labels predicted for a sample must exactly match the corresponding set of labels in y_true. This way of computing the accuracy is sometime named, perhaps less ambiguously, exact match ratio (1): Is there any way to get the other typical way to compute the accuracy in scikit-learn, namely (as defined in (1) and (2), and less ambiguously referred to as the Hamming score

Getting the accuracy for multi-label prediction in scikit-learn

怎甘沉沦 提交于 2021-02-05 18:52:09
问题 In a multilabel classification setting, sklearn.metrics.accuracy_score only computes the subset accuracy (3): i.e. the set of labels predicted for a sample must exactly match the corresponding set of labels in y_true. This way of computing the accuracy is sometime named, perhaps less ambiguously, exact match ratio (1): Is there any way to get the other typical way to compute the accuracy in scikit-learn, namely (as defined in (1) and (2), and less ambiguously referred to as the Hamming score

Specific number of test/train size for each class in sklearn

丶灬走出姿态 提交于 2021-02-04 21:17:10
问题 Data: import pandas as pd data = pd.DataFrame({'classes':[1,1,1,2,2,2,2],'b':[3,4,5,6,7,8,9], 'c':[10,11,12,13,14,15,16]}) My code: import numpy as np from sklearn.cross_validation import train_test_split X = np.array(data[['b','c']]) y = np.array(data['classes']) X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=4) Question: train_test_split will randomly choose test set from all the classes. Is there any way to have the same number of test set for each class ? (For example

Keras returns binary results

天涯浪子 提交于 2021-02-02 09:56:41
问题 I want to predict the kind of 2 diseases but I get results as binary (like 1.0 and 0.0). How can I get accuracy of these (like 0.7213)? Training code: from keras.models import Sequential from keras.layers import Conv2D from keras.layers import MaxPooling2D from keras.layers import Flatten from keras.layers import Dense # Intialising the CNN classifier = Sequential() # Step 1 - Convolution classifier.add(Conv2D(32, (3, 3), input_shape = (64, 64, 3), activation = 'relu')) # Step 2 - Pooling

Toy example of Logistic Regression with Tensorflow probability and the titanic dataset fails

怎甘沉沦 提交于 2021-01-29 20:00:18
问题 I am learning tensorflow-probability and this is a toy example of logistic regression with the titanic dataset. My model does not seem to learn and the loss is nan. I don't understand why. Below you will find three different implementations, all return the same results. One uses a sigmoid activation, the second uses a DistributionLambda Layer with a Bernoulli distribution and the third a DistributionLambda Layer with a Beta distribution. Are there any corrections I should make to this code?

Unable to detect gibberish names using Python

浪子不回头ぞ 提交于 2021-01-29 10:32:16
问题 I am trying to build Python model that could classify account names as either legitimate or gibberish. Capitalization is not important in this particular case as some legitimate account names could be comprised of all upper-case or all lower-case letters. Disclaimer: this is just a internal research/experiment and no real action will be taken on the classifier outcome. In my particular, there are 2 possible characteristics that can reveal an account name as suspicious, gibberish or both:

Why roc_auc produces weird results in sklearn?

半世苍凉 提交于 2021-01-29 05:44:56
问题 I have a binary classification problem where I use the following code to get my weighted avarege precision , weighted avarege recall , weighted avarege f-measure and roc_auc . df = pd.read_csv(input_path+input_file) X = df[features] y = df[["gold_standard"]] clf = RandomForestClassifier(random_state = 42, class_weight="balanced") k_fold = StratifiedKFold(n_splits=10, shuffle=True, random_state=0) scores = cross_validate(clf, X, y, cv=k_fold, scoring = ('accuracy', 'precision_weighted',