classification

classification: PCA and logistic regression using sklearn

浪子不回头ぞ 提交于 2019-12-07 00:48:36
问题 Step 0: Problem description I have a classification problem, ie I want to predict a binary target based on a collection of numerical features, using logistic regression, and after running a Principal Components Analysis (PCA). I have 2 datasets: df_train and df_valid (training set and validation set respectively) as pandas data frame, containing the features and the target. As a first step, I have used get_dummies pandas function to transform all the categorical variables as boolean. For

Creating an ARFF file from python output

霸气de小男生 提交于 2019-12-06 22:26:05
问题 gardai-plan-crackdown-on-troublemakers-at-protest-2438316.html': {'dail': 1, 'focus': 1, 'actions': 1, 'trade': 2, 'protest': 1, 'identify': 1, 'previous': 1, 'detectives': 1, 'republican': 1, 'group': 1, 'monitor': 1, 'clashes': 1, 'civil': 1, 'charge': 1, 'breaches': 1, 'travelling': 1, 'main': 1, 'disrupt': 1, 'real': 1, 'policing': 3, 'march': 6, 'finance': 1, 'drawn': 1, 'assistant': 1, 'protesters': 1, 'emphasised': 1, 'department': 1, 'traffic': 2, 'outbreak': 1, 'culprits': 1,

Plotting a linear discriminant analysis, classification tree and Naive Bayes Curve on a single ROC plot

为君一笑 提交于 2019-12-06 22:17:51
The data is present at the very bottom of the page and is called LDA.scores'. This is a classification task where I performed three supervised machine learning classification techniques on the data-set. All coding is supplied to show how these ROC curves were produced. I apologise for asking a loaded question but I have been trying to solve these issues using different combinations of code for almost two weeks, so if anyone can help me, then thank you. The main issue is the Naive Bayes curve shows a perfect score of 1, which is obviously wrong, and I cannot solve how to incorporate the linear

Clustering and Bayes classifiers Matlab

☆樱花仙子☆ 提交于 2019-12-06 21:55:34
问题 So I am at a cross roads on what to do next, I set out to learn and apply some machine learning algorithms on a complicated dataset and I have now done this. My plan from the very beginning was to combine two possible classifiers in an attempt to make a multi-classification system. But here is where I am stuck. I choose a clustering algorithm (Fuzzy C Means) (after learning some sample K-means stuff) and Naive Bayes as the two candidates for the MCS (Multi-Classifier System). I can use both

How to use or abuse artifact classifiers in maven?

狂风中的少年 提交于 2019-12-06 19:10:51
问题 We are currently attempting to port a very (very) large project built with ant to maven (while also moving to svn). All possibilities are being explored in remodeling the project structure to best fit the maven paradigm. Now to be more specific, I have come across classifiers and would like to know how I could use them to my advantage, while refraining from "classifier anti-patterns". Thanks from: http://maven.apache.org/pom.html classifier: You may occasionally find a fifth element on the

Multi-layer neural network won't predict negative values

时光怂恿深爱的人放手 提交于 2019-12-06 17:06:02
问题 I have implemented a multilayer perceptron to predict the sin of input vectors. The vectors consist of four -1,0,1's chosen at random and a bias set to 1. The network should predict the sin of sum of the vectors contents. eg Input = <0,1,-1,0,1> Output = Sin(0+1+(-1)+0+1) The problem I am having is that the network will never predict a negative value and many of the vectors' sin values are negative. It predicts all positive or zero outputs perfectly. I am presuming that there is a problem

Why do Tensorflow tf.learn classification results vary a lot?

允我心安 提交于 2019-12-06 15:30:55
问题 I use the TensorFlow high-level API tf.learn to train and evaluate a DNN classifier for a series of binary text classifications (actually I need multi-label classification but at the moment I check every label separately). My code is very similar to the tf.learn Tutorial classifier = tf.contrib.learn.DNNClassifier( hidden_units=[10], n_classes=2, dropout=0.1, feature_columns=tf.contrib.learn.infer_real_valued_columns_from_input(training_set.data)) classifier.fit(x=training_set.data, y

Get recall (sensitivity) and precision (PPV) values of a multi-class problem in PyML

你离开我真会死。 提交于 2019-12-06 13:55:55
问题 I am using PyML for SVM classification. However, I noticed that when I evaluate a multi-class classifier using LOO, the results object does not report the sensitivity and PPV values. Instead they are 0.0: from PyML import * from PyML.classifiers import multi mc = multi.OneAgainstRest(SVM()) data = VectorDataSet('iris.data', labelsColumn=-1) result = mc.loo(data) result.getSuccessRate() >>> 0.95333333333333337 result.getPPV() >>> 0.0 result.getSensitivity() >>> 0.0 I have looked at the code

Newbie: where to start given a problem to predict future success or not

好久不见. 提交于 2019-12-06 13:44:43
问题 We have had a production web based product that allows users to make predictions about the future value (or demand) of goods, the historical data contains about 100k examples, each example has about 5 parameters; Consider a class of data called a prediciton: prediction { id: int predictor: int predictionDate: date predictedProductId: int predictedDirection: byte (0 for decrease, 1 for increase) valueAtPrediciton: float } and a paired result class that measures the result of the prediction:

Incremental Decision Tree C++ Implementation

霸气de小男生 提交于 2019-12-06 13:15:18
Do anyone know any incremental implementation of decision tree classifier. Such that it could generate optimal decision tree classifier when you add new instance to training set with low computation and as quick as possible according existing decision tree classifier? In other words I have an optimal decision tree classifier of set A , which named T_1 , now I want to add instance X to set A and find optimal decision tree classifier tree T_2 by taking advantage of T_1 and X for set {A,X} . adding instances will occurs several times. So it will valuable for me to find incremental method instead