classification

How to make binary classication in Spark ML without StringIndexer

爷,独闯天下 提交于 2019-12-23 04:24:09
问题 I try to use Spark ML DecisionTreeClassifier in Pipeline without StringIndexer, because my feature is already indexed as (0.0; 1.0). DecisionTreeClassifier as label requires double values, so this code should work: def trainDecisionTreeModel(training: RDD[LabeledPoint], sqlc: SQLContext): Unit = { import sqlc.implicits._ val trainingDF = training.toDF() //format of this dataframe: [label: double, features: vector] val featureIndexer = new VectorIndexer() .setInputCol("features") .setOutputCol

Classification tree in R limit to 32 levels

假装没事ソ 提交于 2019-12-23 03:59:12
问题 I am trying to create a classification tree in R using the package tree. This is an excerpt of the dataset I am using (header included): CENTRO_EXAMEN,NOMBRE_AUTOESCUELA,MES,TIPO_EXAMEN,NOMBRE_PERMISO,PROB Alcal· de Henares,17APTOV,5,PRUEBA DESTREZA,A2 ,0 Alcal· de Henares,17APTOV,5,PRUEBA CONDUCCION Y CIRCULACION,B ,0.8 Alcal· de Henares,17APTOV,5,PRUEBA TEORICA,B ,0.333333333 Alcal· de Henares,2000,5,PRUEBA TEORICA,B ,0 and this is the commands I am issuing to R: madrid=read.csv("madrid.csv

How can I classify different images with various sizes and formats in scikit-learn?

人盡茶涼 提交于 2019-12-23 03:58:08
问题 I'm trying to build a simple image classifier using scikit-learn. I'm hoping to avoid having to resize and convert each image before training. Question Given two different images that are different formats and sizes ( 1.jpg and 2.png ), how can I avoid a ValueError while fitting the model? I have one example where I train using only 1.jpg , which fits successfully. I have another example where I train using both 1.jpg and 2.png and a ValueError is produced. This example will fit successfully:

Image classification and image resizing

醉酒当歌 提交于 2019-12-23 03:46:15
问题 I have a set of images that I am using for a typical classification problem using Tensorflow. The images come in different sizes so I wrote a small piece of code to resize them all. But the question is what is the best strategy of resizing for training purposes? For example, is it better to resize them, no matter how they scale up or down, or it is better to keep the aspect ratio and add some artificial zero padding around the resized images? I believe this is a typical question with some

Use sklearn DBSCAN model to classify new entries

て烟熏妆下的殇ゞ 提交于 2019-12-23 01:38:12
问题 I have a huge "dynamic" dataset and I'm trying to find interesting clusters on it. After running a lot of different unsupervised clustering algorithms I have found a configuration of DBSCAN which gives coherent results. I would like to extrapolate the model that DBSCAN creates according to my test data to apply it to other datasets, but without re-running the algorithm. I cannot run the algorithm over the whole dataset cause it would run out of memory, and the model might not make sense to me

scikit multilabel classification: ValueError: bad input shape

丶灬走出姿态 提交于 2019-12-23 01:21:10
问题 I beieve SGDClassifier() with loss='log' supports Multilabel classification and I do not have to use OneVsRestClassifier. Check this Now, my dataset is quite big and I am using HashingVectorizer and passing result as input to SGDClassifier . My target has 42048 features. When I run this, as follows: clf.partial_fit(X_train_batch, y) I get: ValueError: bad input shape (300000, 42048) . I have also used classes as the parameter as follows, but still same problem. clf.partial_fit(X_train_batch,

Large scale naïve Bayes classifier with top-k output

Deadly 提交于 2019-12-22 18:29:50
问题 I need a library for naïve Bayes large scale, with millions of training examples and +100k binary features. It must be an online version (updatable after training). I also need top-k output, that is multiple classifications for a single instance. Accuracy is not very important. The purpose is an automatic text categorization application. Any suggestions for a good library is very appreciated. EDIT: The library should preferably be in Java. 回答1: If a learning algorithm other than naïve Bayes

Map predictions back to IDs - Python Scikit Learn DecisionTreeClassifier

六眼飞鱼酱① 提交于 2019-12-22 17:17:09
问题 I have a dataset that has a unique identifier and other features. It looks like this ID LenA TypeA LenB TypeB Diff Score Response 123-456 51 M 101 L 50 0.2 0 234-567 46 S 49 S 3 0.9 1 345-678 87 M 70 M 17 0.7 0 I split it up into training and test data. I am trying to classify test data into two classes from a classifier trained on training data. I want the identifier in the training and testing dataset so I can map the predictions back to the IDs . Is there a way that I can assign the

How to examine the feature weights of a Tensorflow LinearClassifier?

旧巷老猫 提交于 2019-12-22 13:52:59
问题 I am trying to understand the Large-scale Linear Models with TensorFlow documentation. The docs motivate these models as follows: Linear model can be interpreted and debugged more easily than neural nets. You can examine the weights assigned to each feature to figure out what's having the biggest impact on a prediction. So I ran the extended code example from the accompanying TensorFlow Linear Model Tutorial. In particular, I ran the example code from GitHub with the model-type flag set to

Which resource structure to use for both Android tablets and mobiles?

♀尐吖头ヾ 提交于 2019-12-22 10:34:53
问题 I updated my app's resources the last time when higher density devices came up. Started drawing my icons in higher resolution and supplied them through the use of the res/drawable-hdpi directory structure. So far, so good. Now I wanted to do some changes to adopt for Android tablets. Have updated a few layouts and then I slowly realized why all my icons looked somewhat strange to me on that nice display: the "normal" 10 inch tablets are not classified as hdpi, but mdpi thus all my "old"