I\'m working on the following code for performing Random Forest Classification on train and test sets;
from sklearn.ensemble import RandomForestClassifier
fr
I also had this error when I was also trying to load a text dataset with genfromtext and do text classification with Keras.
The data format was: [some_text]\t[class_label]
.
My understanding was that there are some characters in the 1st column that somehow confuse the parser and the two columns cannot be split properly.
data = np.genfromtxt(my_file.csv, delimiter='\t', usecols=(0,1), dtype=str);
this snippet created the same ValueError with yours and my first workaround was to read everything as one column:
data = np.genfromtxt(my_file, delimiter='\t', usecols=(0), dtype=str);
and split the data later by myself.
However, what finally worked properly was to explicitly define the comment parameter in genfromtxt.
data = np.genfromtxt(my_file, delimiter='\t', usecols=(0,1), dtype=str, comments=None);
According to the documentation:
The optional argument comments is used to define a character string that marks the beginning of a comment. By default, genfromtxt assumes comments='#'. The comment marker may occur anywhere on the line. Any character present after the comment marker(s) is simply ignored.
the default character that indicates a comment is '#', and thus if this character is included in your text column, everything is ignored after it. That is probably why the two columns cannot be recognized by genfromtext.