sklearn-pandas

Python数据分析实战:大(zhuang)佬(bi)级别数据预处理方式

做~自己de王妃 提交于 2020-08-06 21:08:26
Python实战社群 Java实战社群 长按识别下方二维码, 按需求添加 扫码关注添加客服 进Python社群▲ 扫码关注添加客服 进Java社群 ▲ 作者丨琥珀里有波罗的海 https://zhuanlan.zhihu.com/p/146906814 前言 之前写的文字都比较干,每篇文章都是篇幅巨长,恨不得一篇文章把一个数据集从入手到预测完成全部覆盖。这里面还要加上自己的“思路”和“弯路”。 这次我们专门挑了一份烂大街的数据集Titanic(后台回复: Titanic 即可获取),写了一点关于数据预处理部分,但是代码风格却是大(zhuang)佬(bi)级别。很明显,我不是大佬,不过是有幸被培训过。 说到预处理,一般就是需要: 数字型缺失值处理 类别型缺失值处理 数字型标准化 类别型特征变成dummy变量 Pipeline 思想 在做数据处理以及机器学习的过程中,最后你会发现每个项目似乎都存在“套路”。所有的项目处理过程都会存在一个“套路”: 预处理 建模 训练 预测 对于预处理,其实也是一个套路,不过我们不用pipeline 函数,而是另一个FeatureUnion函数。 当然一个函数也不能解决所有问题,我们通过实战来看看哪些函数以及编码风格能让我们的代码看起来很有条理并且“大(zhuang)佬(bi)”风格十足。 导入数据开启实战 今天我们分析的titanic 数据

What is the difference between x_test, x_train, y_test, y_train in sklearn?

微笑、不失礼 提交于 2020-07-20 06:34:55
问题 I'm learning sklearn and I didn't understand very good the difference and why use 4 outputs with the function train_test_split. In the Documentation, I found some examples but it wasn't sufficient to end my doubts. Does the code use the x_train to predict the x_test or use the x_train to predict the y_test? What is the difference between train and test? Do I use train to predict the test or something similar? I'm very confused about it. I will let below the example provided in the

What is the difference between x_test, x_train, y_test, y_train in sklearn?

ε祈祈猫儿з 提交于 2020-07-20 06:34:08
问题 I'm learning sklearn and I didn't understand very good the difference and why use 4 outputs with the function train_test_split. In the Documentation, I found some examples but it wasn't sufficient to end my doubts. Does the code use the x_train to predict the x_test or use the x_train to predict the y_test? What is the difference between train and test? Do I use train to predict the test or something similar? I'm very confused about it. I will let below the example provided in the