classification

Data Prediction using Decision Tree of rpart

妖精的绣舞 提交于 2021-02-18 22:28:47
问题 I am using R to classify a data-frame called 'd' containing data structured like below: The data has 576666 rows and the column "classLabel" has a factor of 3 levels: ONE, TWO, THREE. I am making a decision tree using rpart: fitTree = rpart(d$classLabel ~ d$tripduration + d$from_station_id + d$gender + d$birthday) And I want to predict the values for the "classLabel" for newdata : newdata = data.frame( tripduration=c(345,244,543,311), from_station_id=c(60,28,100,56), gender=c("Male","Female",

How does List::Util 'shuffle' actually work?

允我心安 提交于 2021-02-18 12:33:04
问题 I am currently working on building a classifier using c5.0. I have a dataset of 8000 entries and each entry has its own i.d number (1-8000). When testing the performance of the classifier I had to make 5sets of 10:90 (training data: test data) splits. Of course any training cases cannot appear again in the test cases, and duplicates cannot occur in either set. To solve the problem of picking examples at random for the training data, and making sure the same cannot be picked for the test data

sklearn multiclass svm function

久未见 提交于 2021-02-18 08:30:48
问题 I have multi class labels and want to compute the accuracy of my model. I am kind of confused on which sklearn function I need to use. As far as I understood the below code is only used for the binary classification. # dividing X, y into train and test data X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25,random_state = 0) # training a linear SVM classifier from sklearn.svm import SVC svm_model_linear = SVC(kernel = 'linear', C = 1).fit(X_train, y_train) svm

Extracting Information from the Decision Rules in rpart package

社会主义新天地 提交于 2021-02-16 20:58:13
问题 I need to extract information from the rules in decision tree. I am using rpart package in R. I am using demo data in the package to explain my requirements: data(stagec) fit<- rpart(formula = pgstat ~ age + eet + g2 + grade + gleason + ploidy, data = stagec, method = "class", control=rpart.control(cp=0.05)) fit printing fit shows n= 146 node), split, n, loss, yval, (yprob) * denotes terminal node 1) root 146 54 0 (0.6301370 0.3698630) 2) grade< 2.5 61 9 0 (0.8524590 0.1475410) * 3) grade>=2

Extracting Information from the Decision Rules in rpart package

送分小仙女□ 提交于 2021-02-16 20:58:06
问题 I need to extract information from the rules in decision tree. I am using rpart package in R. I am using demo data in the package to explain my requirements: data(stagec) fit<- rpart(formula = pgstat ~ age + eet + g2 + grade + gleason + ploidy, data = stagec, method = "class", control=rpart.control(cp=0.05)) fit printing fit shows n= 146 node), split, n, loss, yval, (yprob) * denotes terminal node 1) root 146 54 0 (0.6301370 0.3698630) 2) grade< 2.5 61 9 0 (0.8524590 0.1475410) * 3) grade>=2

How to plot classification borders on an Linear Discrimination Analysis plot in R

有些话、适合烂在心里 提交于 2021-02-16 08:54:47
问题 I have used a linear discriminant analysis (LDA) to investigate how well a set of variables discriminates between 3 groups. I then used the plot.lda() function to plot my data on the two linear discriminants (LD1 on the x-axis and LD2 on the y-axis). I would now like to add the classification borders from the LDA to the plot. I cannot see a argument in the function that allows this. The partimat() function allows visualisation of the LD classification borders, but variables are used as the x

Classification with pretrained pytorch vgg16 model and its classes

大兔子大兔子 提交于 2021-02-11 15:54:33
问题 I wrote a image vgg classification model with pytorch's pretrained vgg16 model. import matplotlib.pyplot as plt import numpy as np import torch from PIL import Image import urllib from skimage.transform import resize from skimage import io import yaml # Downloading imagenet 1000 classes list file = urllib. request. urlopen("https://gist.githubusercontent.com/yrevar/942d3a0ac09ec9e5eb3a/raw/238f720ff059c1f82f368259d1ca4ffa5dd8f9f5/imagenet1000_clsidx_to_labels.txt") classes = '' for f in file:

Classification with pretrained pytorch vgg16 model and its classes

自古美人都是妖i 提交于 2021-02-11 15:54:21
问题 I wrote a image vgg classification model with pytorch's pretrained vgg16 model. import matplotlib.pyplot as plt import numpy as np import torch from PIL import Image import urllib from skimage.transform import resize from skimage import io import yaml # Downloading imagenet 1000 classes list file = urllib. request. urlopen("https://gist.githubusercontent.com/yrevar/942d3a0ac09ec9e5eb3a/raw/238f720ff059c1f82f368259d1ca4ffa5dd8f9f5/imagenet1000_clsidx_to_labels.txt") classes = '' for f in file:

Size-1 array error when preparing decision model

半腔热情 提交于 2021-02-11 15:02:26
问题 I have DataFrame called data with 477154 rows. PDB_ID Chain Sequence Secstr 0 101M A GEWQLVLHVWAKVEA | HHHH HHHHGG| 1 102L A MVLSEGEWKVEA |HHHH HHHHHH| 2 102M A MVLSEGEWQLVLHVWAKVEA |HHHHHHHHHGGHH HHH | 3 103L A MVLSEGEWQLVLHVWAKV | HHHHH HHHHHH HH| 4 103L B MVLSEGEWQLVLHVWAKVEAVAL | HHHHH HHHHHH HHHHH | My goal is to get each character one by one from columns: 'Sequence' and 'Secstr' to arrays and make it usable for classification. Every row has different number of elements. I tried to do it

How to fix “numpy.core._exceptions.MemoryError” while performing MNIST digit classifier?

旧时模样 提交于 2021-02-11 14:53:42
问题 I am making a Stochastic Gradient Descent Classifier (SGDClassifier) using scikit- learn. While Fitting my training data (of shape (60000,784)), I am getting memory error. How to fix it? I have already tried switching from 32 bit to 64 bit IDE. And reducing the training data will decrease the performance (that is basically not the option). Code: (Python 3.7) # Classification Problem # Date: 1st September 2019 # Author: Pranay Saha import pandas as pd x_train= pd.read_csv('mnist_train.csv') y