text-classification

How to add another feature (length of text) to current bag of words classification? Scikit-learn

笑着哭i 提交于 2019-11-28 09:09:01
I am using bag of words to classify text. It's working well but I am wondering how to add a feature which is not a word. Here is my sample code. import numpy as np from sklearn.pipeline import Pipeline from sklearn.feature_extraction.text import CountVectorizer from sklearn.svm import LinearSVC from sklearn.feature_extraction.text import TfidfTransformer from sklearn.multiclass import OneVsRestClassifier X_train = np.array(["new york is a hell of a town", "new york was originally dutch", "new york is also called the big apple", "nyc is nice", "the capital of great britain is london. london is

Dimension of shape in conv1D

醉酒当歌 提交于 2019-11-27 03:01:02
I have tried to build a CNN with one layer, but I have some problem with it. Indeed, the compilator says me that ValueError: Error when checking model input: expected conv1d_1_input to have 3 dimensions, but got array with shape (569, 30) This is the code import numpy from keras.models import Sequential from keras.layers.convolutional import Conv1D numpy.random.seed(7) datasetTraining = numpy.loadtxt("CancerAdapter.csv",delimiter=",") X = datasetTraining[:,1:31] Y = datasetTraining[:,0] datasetTesting = numpy.loadtxt("CancereEvaluation.csv",delimiter=",") X_test = datasetTraining[:,1:31] Y

How to add another feature (length of text) to current bag of words classification? Scikit-learn

◇◆丶佛笑我妖孽 提交于 2019-11-27 02:16:24
问题 I am using bag of words to classify text. It's working well but I am wondering how to add a feature which is not a word. Here is my sample code. import numpy as np from sklearn.pipeline import Pipeline from sklearn.feature_extraction.text import CountVectorizer from sklearn.svm import LinearSVC from sklearn.feature_extraction.text import TfidfTransformer from sklearn.multiclass import OneVsRestClassifier X_train = np.array(["new york is a hell of a town", "new york was originally dutch", "new

Scikit learn - fit_transform on the test set

折月煮酒 提交于 2019-11-26 16:52:58
问题 I am struggling to use Random Forest in Python with Scikit learn. My problem is that I use it for text classification (in 3 classes - positive/negative/neutral) and the features that I extract are mainly words/unigrams, so I need to convert these to numerical features. I found a way to do it with DictVectorizer 's fit_transform : from sklearn.preprocessing import LabelEncoder from sklearn.metrics import classification_report from sklearn.feature_extraction import DictVectorizer vec =

How can I plot a confusion matrix? [duplicate]

淺唱寂寞╮ 提交于 2019-11-26 12:54:00
问题 This question already has answers here : How to plot confusion matrix with string axis rather than integer in python (4 answers) Closed last year . I am using scikit-learn for classification of text documents(22000) to 100 classes. I use scikit-learn\'s confusion matrix method for computing the confusion matrix. model1 = LogisticRegression() model1 = model1.fit(matrix, labels) pred = model1.predict(test_matrix) cm=metrics.confusion_matrix(test_labels,pred) print(cm) plt.imshow(cm, cmap=\