classification

Neural Network in python: Decision/Classification always gives 0.5

北城以北 提交于 2019-12-24 02:10:00
问题 First of all I wanna say that I am a python beginner and also completely new to neural networks. When I read about it I was very excited and thought I set up a little code from scratch (see code below). But somehow my code is not working properly. I guess there are some major bugs (in the algorithm and the programming?). But I cannot find them at the moment. So, in the handwritten notes you can see my system (and some formulas). I wanna solve a decision problem where I have data in the form

Cross validation with KNN classifier in Matlab

大憨熊 提交于 2019-12-24 00:50:53
问题 I am trying to extend this answer to knn classifier: load fisheriris; % // convert species to double isnum = cellfun(@isnumeric,species); result = NaN(size(species)); result(isnum) = [species{isnum}]; % // Crossvalidation vals = crossval(@(XTRAIN, YTRAIN, XTEST, YTEST)fun_knn(XTRAIN, YTRAIN, XTEST, YTEST), meas, result); the fun_knn funcion is: function testval = fun_knn(XTRAIN, YTRAIN, XTEST, YTEST) yknn = knnclassify(XTEST, XTRAIN, YTRAIN); [~,classNet] = max(yknn,[],2); [~,classTest] = max

Handle null/NaN values in spark mllib classifier

生来就可爱ヽ(ⅴ<●) 提交于 2019-12-23 23:04:05
问题 I have a set of categorical columns (strings), that I'm parsing and converting into Vectors of features to pass to a mllib classifier (random forest). In my input data, some columns have null values. Say, in one of those columns, I have p values + a null value : How should I build my feature Vectors, and the categoricalFeaturesInfo map of the classifier ? option 1 : I tell p values in categoricalFeaturesInfo, and I use Double.NaN in my input Vectors ? side question : How NaNs are handled by

Accuracy gets worse the longer I train A Keras Model

送分小仙女□ 提交于 2019-12-23 21:24:36
问题 I'm currently using a resnet built in keras to do two class classification. I am using model checkpoint to save the best models based off of validation accuracy. Better and better models are saved until I go through all my datapoints a few times. Keras keeps saving new models showing they have higher accuracy but when I test the models they perform worse than previous models. Here is an output of testing each model with validation data. The first number in the model name is the epoch, the

Classifying unlabelled data in Weka

邮差的信 提交于 2019-12-23 20:47:58
问题 I'm currently using various classifiers in Weka. My testing data is labelled, e.g.: @relation bmwreponses @attribute IncomeBracket {0,1,2,3,4,5,6,7} @attribute FirstPurchase numeric @attribute LastPurchase numeric @attribute responded {1,0} @data 4,200210,200601,0 5,200301,200601,1 6,200411,200601,0 5,199609,200603,0 6,200310,200512,1 ... The last value per row is the class element, i.e. responded. But if I try unlabelled test data, e.g.: @relation bmwreponses @attribute IncomeBracket {0,1,2

Create class intervals in r and sum values

偶尔善良 提交于 2019-12-23 18:56:16
问题 I have a set of data (cost & distance) I want to aggregate those ns classes depending on the distance and find the sum of the cost for the aggregated data. Here are some example tables. Nam Cost distance 1 1005 10 2 52505 52 3 51421 21 4 651 10 5 656 0 6 5448 1 Classes Class From To 1 0 5 2 5 15 3 15 100 Result Class Sum 1 6104 2 1656 3 103926 I am doing this but it takes a lot of time to process. I sure that there is a better way to do it for (i in 1:6) { for (j in 1:3) { if((Table_numbers[i

Minimum Distance Algorithm using GDAL and Python

こ雲淡風輕ζ 提交于 2019-12-23 10:29:15
问题 I'm trying to implement the Minimum Distance Algorithm for image classification using GDAL and Python. After calculating the mean pixel-value of the sample areas and storing them into a list of arrays ("sample_array"), I read the image into an array called "values". With the following code I loop through this array: values = valBD.ReadAsArray() # loop through pixel columns for X in range(0,XSize): # loop thorugh pixel lines for Y in range (0, YSize): # initialize variables minDist = 9999 #

value error happens when using GridSearchCV

馋奶兔 提交于 2019-12-23 09:34:30
问题 I am using GridSearchCV to do classification and my codes are: parameter_grid_SVM = {'dual':[True,False], 'loss':["squared_hinge","hinge"], 'penalty':["l1","l2"] } clf = GridSearchCV(LinearSVC(),param_grid=parameter_grid_SVM,verbose=2) clf.fit(trian_data, labels) And then, I meet the error ValueError: Unsupported set of arguments: penalty='l1' is only supported when dual='false'., Parameters: penalty='l1', loss='hinge', dual=False later on I change my code to : clf = GridSearchCV(LinearSVC

unable to use FeatureUnion in scikit-learn due to different dimensions

放肆的年华 提交于 2019-12-23 07:04:04
问题 I'm trying to use FeatureUnion to extract different features from a datastructure, but it fails due to different dimensions: ValueError: blocks[0,:] has incompatible row dimensions Implementaion My FeatureUnion is built the following way: features = FeatureUnion([ ('f1', Pipeline([ ('get', GetItemTransformer('f1')), ('transform', vectorizer_f1) ])), ('f2', Pipeline([ ('get', GetItemTransformer('f2')), ('transform', vectorizer_f1) ])) ]) GetItemTransformer is used to get different parts of

how to analyse and predict(machine learning) a time series data set using scikit-learn for python

泪湿孤枕 提交于 2019-12-23 06:39:19
问题 i got data-set like this i need to analyse and predict the status column. This is just 2 entrees from the training data set. In this data set there is heart rate pattern(which is collected in 1 second intervals, 10 numbers altogether) its a time series array(correct me if i'm wrong) i just need to know best way to analyse and get a prediction using this data. I'm using scikit-learning for my data-mining and machine learning. What i just want to know is what is the best way to analyse these