machine-learning

Ntlk & Python, plotting ROC curve

时光怂恿深爱的人放手 提交于 2021-02-10 13:29:13
问题 I am using nltk with Python and I would like to plot the ROC curve of my classifier (Naive Bayes). Is there any function for plotting it or should I have to track the True Positive rate and False Positive rate ? It would be great if someone would point me to some code already doing it... Thanks. 回答1: PyROC looks simple enough: tutorial, source code This is how it would work with the NLTK naive bayes classifier: # class labels are 0 and 1 labeled_data = [ (1, featureset_1), (0, featureset_2),

How do you estimate the performance of a classifier on test data?

余生长醉 提交于 2021-02-10 12:36:43
问题 I'm using scikit to make a supervised classifier and I am currently tuning it to give me good accuracy on the labeled data. But how do I estimate how well it does on the test data (unlabeled)? Also, how do I find out if I'm starting to overfit the classifier? 回答1: You can't score your method on unlabeled data because you need to know right answers. In order to evaluate a method you should split your trainset into (new) train and test (via sklearn.cross_validation.train_test_split, for example

Tensorflow-IO Dataset input pipeline with very large HDF5 files

孤街醉人 提交于 2021-02-10 12:19:31
问题 I have very big training (30Gb) files. Since all the data does not fit in my available RAM, I want to read the data by batch. I saw that there is Tensorflow-io package which implemented a way to read HDF5 into Tensorflow this way thanks to the function tfio.IODataset.from_hdf5() Then, since tf.keras.model.fit() takes a tf.data.Dataset as input containing both samples and targets, I need to zip my X and Y together and then use .batch and .prefetch to load in memory just the necessary data. For

Tensorflow-IO Dataset input pipeline with very large HDF5 files

亡梦爱人 提交于 2021-02-10 12:18:26
问题 I have very big training (30Gb) files. Since all the data does not fit in my available RAM, I want to read the data by batch. I saw that there is Tensorflow-io package which implemented a way to read HDF5 into Tensorflow this way thanks to the function tfio.IODataset.from_hdf5() Then, since tf.keras.model.fit() takes a tf.data.Dataset as input containing both samples and targets, I need to zip my X and Y together and then use .batch and .prefetch to load in memory just the necessary data. For

Tensorflow-IO Dataset input pipeline with very large HDF5 files

佐手、 提交于 2021-02-10 12:18:10
问题 I have very big training (30Gb) files. Since all the data does not fit in my available RAM, I want to read the data by batch. I saw that there is Tensorflow-io package which implemented a way to read HDF5 into Tensorflow this way thanks to the function tfio.IODataset.from_hdf5() Then, since tf.keras.model.fit() takes a tf.data.Dataset as input containing both samples and targets, I need to zip my X and Y together and then use .batch and .prefetch to load in memory just the necessary data. For

Polynomial Regression values generated too far from the coordinates

主宰稳场 提交于 2021-02-10 06:38:29
问题 As per the the below code for Polynomial Regression coefficients value, when I calculate the regression value at any x point. Value obtained is way more away from the equivalent y coordinate (specially for the below coordinates). Can anyone explain why the difference is so high, can this be minimized or any flaw in understanding. The current requirement is not a difference of more 150 at every point. import numpy as np x=[0,5,10,15,20,25,30,35,40,45,50,55,60,65,70,75,80,85,90,95,100] y=[0,885

Polynomial Regression values generated too far from the coordinates

安稳与你 提交于 2021-02-10 06:38:20
问题 As per the the below code for Polynomial Regression coefficients value, when I calculate the regression value at any x point. Value obtained is way more away from the equivalent y coordinate (specially for the below coordinates). Can anyone explain why the difference is so high, can this be minimized or any flaw in understanding. The current requirement is not a difference of more 150 at every point. import numpy as np x=[0,5,10,15,20,25,30,35,40,45,50,55,60,65,70,75,80,85,90,95,100] y=[0,885

ValueError: could not broadcast input array from shape (20,590) into shape (20)

一个人想着一个人 提交于 2021-02-10 06:37:07
问题 I am trying to extract features from .wav files by using MFCC's of the sound files. I am getting an error when I try to convert my list of MFCC's to a numpy array. I am quite sure that this error is occurring because the list contains MFCC values with different shapes (But am unsure of how to solve the issue). I have looked at 2 other stackoverflow posts, however these don't solve my problem because they are too specific to a certain task. ValueError: could not broadcast input array from

I keep getting the error :“ValueError: Expected 2D array, got 1D array instead:” for a linear regression process

蹲街弑〆低调 提交于 2021-02-10 06:29:32
问题 I have 2 arrays which are true_stress and true_strain. I want to do a linear regression to their log10 versions but I keep getting the said error. from sklearn.linear_model import LinearRegression log_tStress = np.log10(true_stress) log_tStrain = np.log10(true_strain) regressor = LinearRegression() regressor.fit(log_tStrain, log_tStress) predict = regressor.predict(log_tStrain) ValueError: Expected 2D array, got 1D array instead: 回答1: Well it kinda is just what it says.You are feeding a 1D

How to use part of inputs for training but rest for loss function in Keras

人盡茶涼 提交于 2021-02-10 06:28:19
问题 I am new to Keras and trying to implement a neural network machine learning model. The input tensor looks like (X1, X2) and outputs (Y). Note X1 and X2 are correlated. In the model, only X1 will be used for training, but both X1 and X2 will be passed to the loss function that is a function of X1, X2, y_pred and y_true. Below is a pseudocode for loss function. def customLossFunctionWrapper(input_tensor): def LossFunction(y_pred, y_true): loss_value = f(X1, X2, y_pred, y_true) return loss_value