问题
I am building a multiclass classification model using sklearn. I am converting my tweets into a 571x1815 sparse matrix of type with 34737 stored elements in Compressed Sparse Row format. I am trying to predict age groups based on history of tweets but I want to add an exogenous categorical variable (gender) to my sparse matrix and they use either Decision Tree or Random Forest to do my prediction. How do I add a vector to a sparse matrix?
def vectorize(df):
bow_transformer = CountVectorizer(tokenizer=nltk.word_tokenize,token_pattern="[a-zA-Z]{2,15}",stop_words="english",
ngram_range=(1, 2),min_df=.01, max_df=.5,max_features=1815)#3000
bow_transformer.fit(df)
messages_bow = bow_transformer.transform(df)
tfidf_transformer = TfidfTransformer().fit(messages_bow)
messages_tfidf = tfidf_transformer.transform(messages_bow)
return messages_tfidf
picture of the pandas Dataframe
来源:https://stackoverflow.com/questions/37866390/add-categorical-variablegender-to-sparse-matrix-for-multiclass-classification