classification | 易学教程

Simple binary logistic regression using MATLAB

阅读更多关于 Simple binary logistic regression using MATLAB

问题 I'm working on doing a logistic regression using MATLAB for a simple classification problem. My covariate is one continuous variable ranging between 0 and 1, while my categorical response is a binary variable of 0 (incorrect) or 1 (correct). I'm looking to run a logistic regression to establish a predictor that would output the probability of some input observation (e.g. the continuous variable as described above) being correct or incorrect. Although this is a fairly simple scenario, I'm

Simple binary logistic regression using MATLAB

阅读更多关于 Simple binary logistic regression using MATLAB

Twitter Sentiments Analysis useful features

阅读更多关于 Twitter Sentiments Analysis useful features

问题 I'm trying to implement Sentiments Analysis functionality and looking for useful features which can be extracted from tweet messages.The features which I have in my mind for now are: Sentiment words Emotion icons Exclamation marks Negation words Intensity words(very,really etc) Is there any other useful features for this task? My goal is not only detect that tweet is positive or negative but also I need to detect level of positivity or negativity(let say in a scale from 0 to 100). Any inputs

Getting the accuracy for multi-label prediction in scikit-learn

阅读更多关于 Getting the accuracy for multi-label prediction in scikit-learn

问题 In a multilabel classification setting, sklearn.metrics.accuracy_score only computes the subset accuracy (3): i.e. the set of labels predicted for a sample must exactly match the corresponding set of labels in y_true. This way of computing the accuracy is sometime named, perhaps less ambiguously, exact match ratio (1): Is there any way to get the other typical way to compute the accuracy in scikit-learn, namely (as defined in (1) and (2), and less ambiguously referred to as the Hamming score

Getting the accuracy for multi-label prediction in scikit-learn

阅读更多关于 Getting the accuracy for multi-label prediction in scikit-learn

Specific number of test/train size for each class in sklearn

阅读更多关于 Specific number of test/train size for each class in sklearn

问题 Data: import pandas as pd data = pd.DataFrame({'classes':[1,1,1,2,2,2,2],'b':[3,4,5,6,7,8,9], 'c':[10,11,12,13,14,15,16]}) My code: import numpy as np from sklearn.cross_validation import train_test_split X = np.array(data[['b','c']]) y = np.array(data['classes']) X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=4) Question: train_test_split will randomly choose test set from all the classes. Is there any way to have the same number of test set for each class ? (For example

Keras returns binary results

阅读更多关于 Keras returns binary results

问题 I want to predict the kind of 2 diseases but I get results as binary (like 1.0 and 0.0). How can I get accuracy of these (like 0.7213)? Training code: from keras.models import Sequential from keras.layers import Conv2D from keras.layers import MaxPooling2D from keras.layers import Flatten from keras.layers import Dense # Intialising the CNN classifier = Sequential() # Step 1 - Convolution classifier.add(Conv2D(32, (3, 3), input_shape = (64, 64, 3), activation = 'relu')) # Step 2 - Pooling

Toy example of Logistic Regression with Tensorflow probability and the titanic dataset fails

阅读更多关于 Toy example of Logistic Regression with Tensorflow probability and the titanic dataset fails

问题 I am learning tensorflow-probability and this is a toy example of logistic regression with the titanic dataset. My model does not seem to learn and the loss is nan. I don't understand why. Below you will find three different implementations, all return the same results. One uses a sigmoid activation, the second uses a DistributionLambda Layer with a Bernoulli distribution and the third a DistributionLambda Layer with a Beta distribution. Are there any corrections I should make to this code?

Unable to detect gibberish names using Python

阅读更多关于 Unable to detect gibberish names using Python

问题 I am trying to build Python model that could classify account names as either legitimate or gibberish. Capitalization is not important in this particular case as some legitimate account names could be comprised of all upper-case or all lower-case letters. Disclaimer: this is just a internal research/experiment and no real action will be taken on the classifier outcome. In my particular, there are 2 possible characteristics that can reveal an account name as suspicious, gibberish or both:

Why roc_auc produces weird results in sklearn?

阅读更多关于 Why roc_auc produces weird results in sklearn?

问题 I have a binary classification problem where I use the following code to get my weighted avarege precision , weighted avarege recall , weighted avarege f-measure and roc_auc . df = pd.read_csv(input_path+input_file) X = df[features] y = df[["gold_standard"]] clf = RandomForestClassifier(random_state = 42, class_weight="balanced") k_fold = StratifiedKFold(n_splits=10, shuffle=True, random_state=0) scores = cross_validate(clf, X, y, cv=k_fold, scoring = ('accuracy', 'precision_weighted',