Learning decision trees on huge datasets
问题 I'm trying to build a binary classification decision tree out of huge (i.e. which cannot be stored in memory) datasets using MATLAB. Essentially, what I'm doing is: Collect all the data Try out n decision functions on the data Pick out the best decision function to separate the classes within the data Split the original dataset into 2 Recurse on the splits The data has k attributes and a classification, so it is stored as a matrix with a huge number of rows, and k+1 columns. The decision