text-classification | 易学教程

How to add another feature (length of text) to current bag of words classification? Scikit-learn

阅读更多关于 How to add another feature (length of text) to current bag of words classification? Scikit-learn

I am using bag of words to classify text. It's working well but I am wondering how to add a feature which is not a word. Here is my sample code. import numpy as np from sklearn.pipeline import Pipeline from sklearn.feature_extraction.text import CountVectorizer from sklearn.svm import LinearSVC from sklearn.feature_extraction.text import TfidfTransformer from sklearn.multiclass import OneVsRestClassifier X_train = np.array(["new york is a hell of a town", "new york was originally dutch", "new york is also called the big apple", "nyc is nice", "the capital of great britain is london. london is

Dimension of shape in conv1D

阅读更多关于 Dimension of shape in conv1D

I have tried to build a CNN with one layer, but I have some problem with it. Indeed, the compilator says me that ValueError: Error when checking model input: expected conv1d_1_input to have 3 dimensions, but got array with shape (569, 30) This is the code import numpy from keras.models import Sequential from keras.layers.convolutional import Conv1D numpy.random.seed(7) datasetTraining = numpy.loadtxt("CancerAdapter.csv",delimiter=",") X = datasetTraining[:,1:31] Y = datasetTraining[:,0] datasetTesting = numpy.loadtxt("CancereEvaluation.csv",delimiter=",") X_test = datasetTraining[:,1:31] Y

How to add another feature (length of text) to current bag of words classification? Scikit-learn

阅读更多关于 How to add another feature (length of text) to current bag of words classification? Scikit-learn

问题 I am using bag of words to classify text. It's working well but I am wondering how to add a feature which is not a word. Here is my sample code. import numpy as np from sklearn.pipeline import Pipeline from sklearn.feature_extraction.text import CountVectorizer from sklearn.svm import LinearSVC from sklearn.feature_extraction.text import TfidfTransformer from sklearn.multiclass import OneVsRestClassifier X_train = np.array(["new york is a hell of a town", "new york was originally dutch", "new

Scikit learn - fit_transform on the test set

阅读更多关于 Scikit learn - fit_transform on the test set

问题 I am struggling to use Random Forest in Python with Scikit learn. My problem is that I use it for text classification (in 3 classes - positive/negative/neutral) and the features that I extract are mainly words/unigrams, so I need to convert these to numerical features. I found a way to do it with DictVectorizer 's fit_transform : from sklearn.preprocessing import LabelEncoder from sklearn.metrics import classification_report from sklearn.feature_extraction import DictVectorizer vec =

How can I plot a confusion matrix? [duplicate]

阅读更多关于 How can I plot a confusion matrix? [duplicate]

问题 This question already has answers here : How to plot confusion matrix with string axis rather than integer in python (4 answers) Closed last year . I am using scikit-learn for classification of text documents(22000) to 100 classes. I use scikit-learn\'s confusion matrix method for computing the confusion matrix. model1 = LogisticRegression() model1 = model1.fit(matrix, labels) pred = model1.predict(test_matrix) cm=metrics.confusion_matrix(test_labels,pred) print(cm) plt.imshow(cm, cmap=\