classification | 易学教程

Value of k in k nearest neighbor algorithm

阅读更多关于 Value of k in k nearest neighbor algorithm

I have 7 classes that needs to be classified and I have 10 features. Is there a optimal value for k that I need to use in this case or do I have to run the KNN for values of k between 1 and 10 (around 10) and determine the best value with the help of the algorithm itself? In addition to the article I posted in the comments there is this one as well that suggests: Choice of k is very critical – A small value of k means that noise will have a higher influence on the result. A large value make it computationally expensive and kinda defeats the basic philosophy behind KNN (that points that are

SPMD vs. Parfor

阅读更多关于 SPMD vs. Parfor

I'm new about parallel computing in matlab. I have a function which creates a classifiers (SVM) and I'd like to test it with several dataset. I've got a 2 core workstation so I'd like to run test in parallel. Can someone explain me the difference between: dataset_array={dataset1, dataset2} matlabpool open 2 spmd my_function(dataset(labindex)); end and dataset_array={dataset1, dataset2} matlabpool open 2 parfor i:1=2 my_function(dataset(i)); end spmd is a parallel region, while parfor is a parallel for loop. The difference is that in spmd region you have a much larger flexibility when it comes

How to implement a hold-out validation in R

阅读更多关于 How to implement a hold-out validation in R

问题 Let's say I'm using the Sonar data and I'd like to make a hold-out validation in R. I partitioned the data using the createFolds from caret package as folds <- createFolds(mydata$Class, k=5) . I would like then to use exactly the fold mydata[i] as test data and train a classifier using mydata[-i] as train data. My first thought was to use the train function, but I couldn't find any support for hold-out validation. Am I missing something here? Also, I'd like to be able to use exactly the pre

How to set proper arguments to build keras Convolution2D NN model [Text Classification]?

阅读更多关于 How to set proper arguments to build keras Convolution2D NN model [Text Classification]?

问题 I am trying to use 2D CNN to do text classification on Chinese Article and have trouble on setting arguments of keras Convolution2D . I know the basic flow of Convolution2D to cope with image, but stuck by using my dataset with keras. Input data My data is 9800 Chinese Article, max sentence length is 6810，with 200 word2vec size. So the input shape is `(9800, 1, 6810, 200)` Code for building model MAX_FEATURES = 6810 # I just randomly pick one filter, seems this is the problem? nb_filter = 128

Simple example/use-case for a BNT gaussian_CPD?

阅读更多关于 Simple example/use-case for a BNT gaussian_CPD?

问题 I am attempting to implement a Naive Bayes classifier using BNT and MATLAB. So far I have been sticking with simple tabular_CPD variables and "guesstimating" probabilities for the variables. My prototype net so far consists of the following: DAG = false(5); DAG(1, 2:5) = true; bnet = mk_bnet(DAG, [2 3 4 3 3]); bnet.CPD{1} = tabular_CPD(bnet, 1, [.5 .5]); bnet.CPD{2} = tabular_CPD(bnet, 2, [.1 .345 .45 .355 .45 .3]); bnet.CPD{3} = tabular_CPD(bnet, 3, [.2 .02 .59 .2 .2 .39 .01 .39]); bnet.CPD

How to calculate KNN Variable Importance in R

阅读更多关于 How to calculate KNN Variable Importance in R

问题 I implemented an Authorship attribution project where I was able to train my KNN model with articles from two authors using KNN. Then, I classify the author of a new article to be either author A or author B. I use knn() function to generate the model. The output of the model is the table below. Word1 Word2 Word3 Author 11 1 48 8 A 2 2 0 0 B 29 1 45 9 A 1 2 0 0 B 4 0 0 0 B 28 3 1 1 B As seen from the model, it is obvious to see that Word2 and Word3 are the most significant variables that

Support Vector Machine works on Training-set but not on Test-set in R (using e1071)

阅读更多关于 Support Vector Machine works on Training-set but not on Test-set in R (using e1071)

问题 I'm using a support vector machine for my document classification task! it classifies all my Articles in the training-set, but fails to classify the ones in my test-set! trainDTM is the document term matrix of my training-set. testDTM is the one for the test-set. here's my (not so beautiful) code: # create data.frame with labelled sentences labeled <- as.data.frame(read.xlsx("C:\\Users\\LABELED.xlsx", 1, header=T)) # create training set and test set traindata <- as.data.frame(labeled[1:700,c(

How to plot a ROC curve using ROCR package in r, with only a classification contingency table

阅读更多关于 How to plot a ROC curve using ROCR package in r, *with only a classification contingency table*

How to plot a ROC curve using ROCR package in r, with only a classification contingency table ? I have a contingency table where the true positive, false positive.. etc. all the rated can be computed. I have 500 replications, therefore 500 tables. But, I can not generate a prediction data indicating each single case of estimating probability and the truth. How can I get a curve without the individual data. Below is the package instruction used. ## computing a simple ROC curve (x-axis: fpr, y-axis: tpr) library(ROCR) data(ROCR.simple) pred <- prediction( ROCR.simple$predictions, ROCR.simple

classification: PCA and logistic regression using sklearn

阅读更多关于 classification: PCA and logistic regression using sklearn

Step 0: Problem description I have a classification problem, ie I want to predict a binary target based on a collection of numerical features, using logistic regression, and after running a Principal Components Analysis (PCA). I have 2 datasets: df_train and df_valid (training set and validation set respectively) as pandas data frame, containing the features and the target. As a first step, I have used get_dummies pandas function to transform all the categorical variables as boolean. For example, I would have: n_train = 10 np.random.seed(0) df_train = pd.DataFrame({"f1":np.random.random(n

Creating an ARFF file from python output

阅读更多关于 Creating an ARFF file from python output

gardai-plan-crackdown-on-troublemakers-at-protest-2438316.html': {'dail': 1, 'focus': 1, 'actions': 1, 'trade': 2, 'protest': 1, 'identify': 1, 'previous': 1, 'detectives': 1, 'republican': 1, 'group': 1, 'monitor': 1, 'clashes': 1, 'civil': 1, 'charge': 1, 'breaches': 1, 'travelling': 1, 'main': 1, 'disrupt': 1, 'real': 1, 'policing': 3, 'march': 6, 'finance': 1, 'drawn': 1, 'assistant': 1, 'protesters': 1, 'emphasised': 1, 'department': 1, 'traffic': 2, 'outbreak': 1, 'culprits': 1, 'proportionate': 1, 'instructions': 1, 'warned': 2, 'commanders': 1, 'michael': 2, 'exploit': 1, 'culminating'