XGBoost

xgboost.XGBClassifier

booster=’gbtree’ 使用的提升数的种类 gbtree, gblinear or dart
silent=True: 训练过程中是否打印日志
n_jobs=1: 并行运行的多线程数

learning_rate=0.1: 训练的学习率，和梯度下降差不多
max_depth=3: 树的最大深度
gamma=0
n_estimators=100: 要拟合的树的棵树，可以认为是训练轮数
min_child_weight=1: 叶结点的最小权重
subsample=1: 训练样本的抽样比率，行索引
colsample_bytree=1: 特征的抽样比率，列索引
reg_alpha=0: L1正则化系数
reg_lambda=1: L2正则化系数

objective=’binary:logistic’ 确定学习任务和相应的学习函数

"reg:linear" -线性回归 "reg:logistic" -逻辑回归 "binary:logistic" -二分类逻辑回归，输出概率 "binary:logitraw" -二分类逻辑回归，输出未logistic变换前的得分  "multi:softmax" "multi:softprob"

random_state=0: 随机种子数
missing=None: 缺失值处理办法
max_delta_step=0,
colsample_bylevel=1
scale_pos_weight=1,
base_score=0.5,
nthread=None: 弃用，改用n_jobs
seed=None：弃用，改用random_state

降低模型复杂度：max_depth, min_child_weight and gamma
对样本随机采样：subsample, colsample_bytree
降低学习率，同时相应提高训练轮数

1.2.1 fit

X：特征矩阵
y: 标签
sample_weight=None: 没一个样本的权重
eval_set=None: (X,y)验证集，用于检测提前结束训练
eval_metric=None: 评价指标

"rmse" "mae" "logloss" "error":二分类错误率，阈值是0.5 "error@t":和error类似，阈值为t "mlogloss" "auc"

early_stopping_rounds=None: 提前结束轮数
verbose=True,
xgb_model=None,
sample_weight_eval_set=None

1.2.2 predict(data, output_margin=False, ntree_limit=0)

返回预测类别，数据类型np.array，阈值不好控制

1.2.3 predict_proba(data, ntree_limit=0)

预测每一个数据，成为给定类别的概率

https://github.com/dmlc/xgboost/tree/master/demo

LightGBM

lightgbm.LGBMClassifier

boosting_type=’gbdt’: 提升树的类型 gbdt,dart,goss,rf
num_leaves=31: 树的最大叶子数
max_depth=-1: 最大的树深度
learning_rate=0.1 提升学习率
n_estimators=10: 拟合的树的棵树，相当于训练轮数
subsample=1.0: 训练样本采样率行
subsample_freq=1: 子样本频率
colsample_bytree=1.0: 训练特征采样率列
reg_alpha=0.0: L1正则化系数
reg_lambda=0.0: L2正则化系数
random_state=None: 随机种子数
n_jobs=-1: 并行运行多线程核心数
silent=True: 训练过程是否打印日志信息
max_bin=255:
subsample_for_bin=200000:
objective=None:
min_split_gain=0.0: 最小分割增益
min_child_weight=0.001: 分支结点的最小权重
min_child_samples=20:

n_features_
classes_
n_classes_
best_score_
best_iteration_
objective_
booster_
evals_result_
feature_importances_

fit

predict_proba

文章来源: XGBoost、LightGBM参数讲解及实战

标签

xgboost

lightgbm

机器学习