classification | 易学教程

Data Prediction using Decision Tree of rpart

阅读更多关于 Data Prediction using Decision Tree of rpart

问题 I am using R to classify a data-frame called 'd' containing data structured like below: The data has 576666 rows and the column "classLabel" has a factor of 3 levels: ONE, TWO, THREE. I am making a decision tree using rpart: fitTree = rpart(d$classLabel ~ d$tripduration + d$from_station_id + d$gender + d$birthday) And I want to predict the values for the "classLabel" for newdata : newdata = data.frame( tripduration=c(345,244,543,311), from_station_id=c(60,28,100,56), gender=c("Male","Female",

How does List::Util 'shuffle' actually work?

阅读更多关于 How does List::Util 'shuffle' actually work?

问题 I am currently working on building a classifier using c5.0. I have a dataset of 8000 entries and each entry has its own i.d number (1-8000). When testing the performance of the classifier I had to make 5sets of 10:90 (training data: test data) splits. Of course any training cases cannot appear again in the test cases, and duplicates cannot occur in either set. To solve the problem of picking examples at random for the training data, and making sure the same cannot be picked for the test data

sklearn multiclass svm function

阅读更多关于 sklearn multiclass svm function

问题 I have multi class labels and want to compute the accuracy of my model. I am kind of confused on which sklearn function I need to use. As far as I understood the below code is only used for the binary classification. # dividing X, y into train and test data X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25,random_state = 0) # training a linear SVM classifier from sklearn.svm import SVC svm_model_linear = SVC(kernel = 'linear', C = 1).fit(X_train, y_train) svm

Extracting Information from the Decision Rules in rpart package

阅读更多关于 Extracting Information from the Decision Rules in rpart package

问题 I need to extract information from the rules in decision tree. I am using rpart package in R. I am using demo data in the package to explain my requirements: data(stagec) fit<- rpart(formula = pgstat ~ age + eet + g2 + grade + gleason + ploidy, data = stagec, method = "class", control=rpart.control(cp=0.05)) fit printing fit shows n= 146 node), split, n, loss, yval, (yprob) * denotes terminal node 1) root 146 54 0 (0.6301370 0.3698630) 2) grade< 2.5 61 9 0 (0.8524590 0.1475410) * 3) grade>=2

Extracting Information from the Decision Rules in rpart package

阅读更多关于 Extracting Information from the Decision Rules in rpart package

How to plot classification borders on an Linear Discrimination Analysis plot in R

阅读更多关于 How to plot classification borders on an Linear Discrimination Analysis plot in R

问题 I have used a linear discriminant analysis (LDA) to investigate how well a set of variables discriminates between 3 groups. I then used the plot.lda() function to plot my data on the two linear discriminants (LD1 on the x-axis and LD2 on the y-axis). I would now like to add the classification borders from the LDA to the plot. I cannot see a argument in the function that allows this. The partimat() function allows visualisation of the LD classification borders, but variables are used as the x

Classification with pretrained pytorch vgg16 model and its classes

阅读更多关于 Classification with pretrained pytorch vgg16 model and its classes

问题 I wrote a image vgg classification model with pytorch's pretrained vgg16 model. import matplotlib.pyplot as plt import numpy as np import torch from PIL import Image import urllib from skimage.transform import resize from skimage import io import yaml # Downloading imagenet 1000 classes list file = urllib. request. urlopen("https://gist.githubusercontent.com/yrevar/942d3a0ac09ec9e5eb3a/raw/238f720ff059c1f82f368259d1ca4ffa5dd8f9f5/imagenet1000_clsidx_to_labels.txt") classes = '' for f in file:

Classification with pretrained pytorch vgg16 model and its classes

阅读更多关于 Classification with pretrained pytorch vgg16 model and its classes

Size-1 array error when preparing decision model

阅读更多关于 Size-1 array error when preparing decision model

问题 I have DataFrame called data with 477154 rows. PDB_ID Chain Sequence Secstr 0 101M A GEWQLVLHVWAKVEA | HHHH HHHHGG| 1 102L A MVLSEGEWKVEA |HHHH HHHHHH| 2 102M A MVLSEGEWQLVLHVWAKVEA |HHHHHHHHHGGHH HHH | 3 103L A MVLSEGEWQLVLHVWAKV | HHHHH HHHHHH HH| 4 103L B MVLSEGEWQLVLHVWAKVEAVAL | HHHHH HHHHHH HHHHH | My goal is to get each character one by one from columns: 'Sequence' and 'Secstr' to arrays and make it usable for classification. Every row has different number of elements. I tried to do it

How to fix “numpy.core._exceptions.MemoryError” while performing MNIST digit classifier?

阅读更多关于 How to fix “numpy.core._exceptions.MemoryError” while performing MNIST digit classifier?

问题 I am making a Stochastic Gradient Descent Classifier (SGDClassifier) using scikit- learn. While Fitting my training data (of shape (60000,784)), I am getting memory error. How to fix it? I have already tried switching from 32 bit to 64 bit IDE. And reducing the training data will decrease the performance (that is basically not the option). Code: (Python 3.7) # Classification Problem # Date: 1st September 2019 # Author: Pranay Saha import pandas as pd x_train= pd.read_csv('mnist_train.csv') y