random-forest | 易学教程

Why pickle.dump(obj) has different size with sys.getsizeof(obj)? How to save variable to file file?

阅读更多关于 Why pickle.dump(obj) has different size with sys.getsizeof(obj)? How to save variable to file file?

问题 I use classifier of random forest from scikit lib of python to do my exercise. The result changes each running time. So I run 1000 times and get the average result. I save object rf into files to predict later by pickle.dump() and get about 4MB each file. However, sys.getsizeof(rf) give me just 36 bytes rf = RandomForestClassifier(n_estimators = 50) rf.fit(matX, vecY) pickle.dump(rf,'var.sav') My questions: sys.getsizeof() seems to be wrong in getting size of RandomForestClassifier object,

H2ORandomForestEstimator with min_samples_split?

阅读更多关于 H2ORandomForestEstimator with min_samples_split?

问题 What is the analogue of min_samples_split for H2ORandomForestEstimator and H2OGradientBoostingEstimator? (h2o min_rows == sklearn min_samples_leaf ) 回答1: It looks like the closest thing to min_samples_split is min_split_improvement : Minimum relative improvement in squared error reduction for a split to happen 来源： https://stackoverflow.com/questions/53642304/h2orandomforestestimator-with-min-samples-split

I have some questions about h2o distributed random forest model

阅读更多关于 I have some questions about h2o distributed random forest model

问题 According to H2O docs in FAQ of the DRF section, this note is mentioned on the "How does the algorithm handle missing values during training?" FAQ: Note: Unlike in GLM, in DRF numerical values are handled the same way as categorical values. Missing values are not imputed with the mean, as is done by default in GLM. I use a DRF Algorithm to solve a regression problem, but when I saw this note, I felt strange. If I convert all numerical value to categorical value to solve regression problem, I

Making Random Forest outputs like Logistic Regression

阅读更多关于 Making Random Forest outputs like Logistic Regression

问题 I am asking dimensional wise etc. I am trying to implement this amazing work with random forest https://www.kaggle.com/allunia/how-to-attack-a-machine-learning-model/notebook Both logistic regression and random forest are from sklearn but when I get weights from random forest model its (784,) while the logistic regression returns (10,784) My most problems are mainly dimension and NaN, infinity or a value too large for dtype errors with attack methods. The weights using logical regression is

R special data frame

阅读更多关于 R special data frame

问题 I'm asking a question follwing the one I asked yesterday in this post : Random Forests for Variables selection. I managed to find out for each quarter the most significant technical trading rules. I've built a data frame to put the names of these TTR. Here is it, I've got with one column for quarter. 1 2 3 4 5 6 7 8 9 10 11 1 RSI2 RSI3 RSI2 RSI10 RSI2 RSI2 RSI2 RSI2 RSI2 RSI2 RSI2 2 RSI3 RSI4 RSI3 RSI20 RSI3 RSI3 RSI3 RSI4 RSI4 RSI3 RSI3 3 RSI4 RSI5 RSI4 EMA5 RSI4 RSI4 RSI5 RSI5 RSI5 RSI4

Is this random forest overfitted?

阅读更多关于 Is this random forest overfitted?

问题 I am training RandomForestRegressor from the scikit-learn library on temporal data and want the forest to predict the trend (next 4 points) given date and time as features. I am predicting the data in small intervals (4 datapoints) and trying to reconstruct the whole day trend to compare to the actual values and calculate MSE by slicing the dataset As you can see on the graph below (the first one), predicted line has some patches that are very similar to the actual data line. The only problem

Extract a subset of tree from random forest model for prediction

阅读更多关于 Extract a subset of tree from random forest model for prediction

问题 From Liaw's classification and regression by RF paper, "The best way to determine how many trees are necessary is to compare predictions made by a forest to predictions made by a subset of forest" I am wondering if there is a way to extract subtree for prediction with R's randomForest package. getTree seems to print out the structure. Any suggestion would be greatly appreciated. 回答1: Try this one in randomForest , predict(rf, dat, predict.all=TRUE) , you can get predictions from all the sub

R- Random forest predict fails with NAs in predictors

阅读更多关于 R- Random forest predict fails with NAs in predictors

问题 The documentation (If I'm reading it correctly) says that the random forest predict function produces NA predictions if it encounters NA predictors for certain observations. NOTE: If the object inherits from randomForest.formula, then any data with NA are silently omitted from the prediction. The returned value will contain NA correspondingly in the aggregated and individual tree predictions (if requested), but not in the proximity or node matrices However, if I try to use the predict

Spark|ML|Random Forest|Load trained model from .txt of RandomForestClassificationModel. toDebugString

阅读更多关于 Spark|ML|Random Forest|Load trained model from .txt of RandomForestClassificationModel. toDebugString

问题 Using Spark 1.6 and the ML library I am saving the results of a trained RandomForestClassificationModel using toDebugString() : val rfModel = model.stages(2).asInstanceOf[RandomForestClassificationModel] val stringModel =rfModel.toDebugString //save stringModel into a file in the driver in format .txt So my idea is that in the future read the file .txt and load the trained randomForest, is it possible? thanks! 回答1: That won't work. ToDebugString is merely a debug info to understand how it's

Should I need to normalize (or scale) the data for Random forest (drf) or Gradient Boosting Machine (GBM) in H2O or in general? [closed]

阅读更多关于 Should I need to normalize (or scale) the data for Random forest (drf) or Gradient Boosting Machine (GBM) in H2O or in general? [closed]

问题 Closed . This question needs details or clarity. It is not currently accepting answers. Want to improve this question? Add details and clarify the problem by editing this post. Closed last year . I am creating a classification and regression models using Random forest (DRF) and GBM in H2O.ai. I believe that I don't need to normalize (or scale) the data as it's un-neccessary rather more harmful as it might smooth out the nonlinear nature of the model. Could you please confirm if my