classification | 易学教程

How to make binary classication in Spark ML without StringIndexer

阅读更多关于 How to make binary classication in Spark ML without StringIndexer

问题 I try to use Spark ML DecisionTreeClassifier in Pipeline without StringIndexer, because my feature is already indexed as (0.0; 1.0). DecisionTreeClassifier as label requires double values, so this code should work: def trainDecisionTreeModel(training: RDD[LabeledPoint], sqlc: SQLContext): Unit = { import sqlc.implicits._ val trainingDF = training.toDF() //format of this dataframe: [label: double, features: vector] val featureIndexer = new VectorIndexer() .setInputCol("features") .setOutputCol

Classification tree in R limit to 32 levels

阅读更多关于 Classification tree in R limit to 32 levels

问题 I am trying to create a classification tree in R using the package tree. This is an excerpt of the dataset I am using (header included): CENTRO_EXAMEN,NOMBRE_AUTOESCUELA,MES,TIPO_EXAMEN,NOMBRE_PERMISO,PROB Alcal· de Henares,17APTOV,5,PRUEBA DESTREZA,A2 ,0 Alcal· de Henares,17APTOV,5,PRUEBA CONDUCCION Y CIRCULACION,B ,0.8 Alcal· de Henares,17APTOV,5,PRUEBA TEORICA,B ,0.333333333 Alcal· de Henares,2000,5,PRUEBA TEORICA,B ,0 and this is the commands I am issuing to R: madrid=read.csv("madrid.csv

How can I classify different images with various sizes and formats in scikit-learn?

阅读更多关于 How can I classify different images with various sizes and formats in scikit-learn?

问题 I'm trying to build a simple image classifier using scikit-learn. I'm hoping to avoid having to resize and convert each image before training. Question Given two different images that are different formats and sizes ( 1.jpg and 2.png ), how can I avoid a ValueError while fitting the model? I have one example where I train using only 1.jpg , which fits successfully. I have another example where I train using both 1.jpg and 2.png and a ValueError is produced. This example will fit successfully:

Image classification and image resizing

阅读更多关于 Image classification and image resizing

问题 I have a set of images that I am using for a typical classification problem using Tensorflow. The images come in different sizes so I wrote a small piece of code to resize them all. But the question is what is the best strategy of resizing for training purposes? For example, is it better to resize them, no matter how they scale up or down, or it is better to keep the aspect ratio and add some artificial zero padding around the resized images? I believe this is a typical question with some

Use sklearn DBSCAN model to classify new entries

阅读更多关于 Use sklearn DBSCAN model to classify new entries

问题 I have a huge "dynamic" dataset and I'm trying to find interesting clusters on it. After running a lot of different unsupervised clustering algorithms I have found a configuration of DBSCAN which gives coherent results. I would like to extrapolate the model that DBSCAN creates according to my test data to apply it to other datasets, but without re-running the algorithm. I cannot run the algorithm over the whole dataset cause it would run out of memory, and the model might not make sense to me

scikit multilabel classification: ValueError: bad input shape

阅读更多关于 scikit multilabel classification: ValueError: bad input shape

问题 I beieve SGDClassifier() with loss='log' supports Multilabel classification and I do not have to use OneVsRestClassifier. Check this Now, my dataset is quite big and I am using HashingVectorizer and passing result as input to SGDClassifier . My target has 42048 features. When I run this, as follows: clf.partial_fit(X_train_batch, y) I get: ValueError: bad input shape (300000, 42048) . I have also used classes as the parameter as follows, but still same problem. clf.partial_fit(X_train_batch,

Large scale naïve Bayes classifier with top-k output

阅读更多关于 Large scale naïve Bayes classifier with top-k output

问题 I need a library for naïve Bayes large scale, with millions of training examples and +100k binary features. It must be an online version (updatable after training). I also need top-k output, that is multiple classifications for a single instance. Accuracy is not very important. The purpose is an automatic text categorization application. Any suggestions for a good library is very appreciated. EDIT: The library should preferably be in Java. 回答1: If a learning algorithm other than naïve Bayes

Map predictions back to IDs - Python Scikit Learn DecisionTreeClassifier

阅读更多关于 Map predictions back to IDs - Python Scikit Learn DecisionTreeClassifier

问题 I have a dataset that has a unique identifier and other features. It looks like this ID LenA TypeA LenB TypeB Diff Score Response 123-456 51 M 101 L 50 0.2 0 234-567 46 S 49 S 3 0.9 1 345-678 87 M 70 M 17 0.7 0 I split it up into training and test data. I am trying to classify test data into two classes from a classifier trained on training data. I want the identifier in the training and testing dataset so I can map the predictions back to the IDs . Is there a way that I can assign the

How to examine the feature weights of a Tensorflow LinearClassifier?

阅读更多关于 How to examine the feature weights of a Tensorflow LinearClassifier?

问题 I am trying to understand the Large-scale Linear Models with TensorFlow documentation. The docs motivate these models as follows: Linear model can be interpreted and debugged more easily than neural nets. You can examine the weights assigned to each feature to figure out what's having the biggest impact on a prediction. So I ran the extended code example from the accompanying TensorFlow Linear Model Tutorial. In particular, I ran the example code from GitHub with the model-type flag set to

Which resource structure to use for both Android tablets and mobiles?

阅读更多关于 Which resource structure to use for both Android tablets and mobiles?

问题 I updated my app's resources the last time when higher density devices came up. Started drawing my icons in higher resolution and supplied them through the use of the res/drawable-hdpi directory structure. So far, so good. Now I wanted to do some changes to adopt for Android tablets. Have updated a few layouts and then I slowly realized why all my icons looked somewhat strange to me on that nice display: the "normal" 10 inch tablets are not classified as hdpi, but mdpi thus all my "old"