【Keras学习笔记】2：多元线性回归

多元线性回归

多元线性回归也就相当于NN的一层，y=wx+b，其中w和x是>1维的同维向量，也就是用输入的特征x1,x2,…去使用参数w和b预测y值。

import pandas as pd import matplotlib as plt %matplotlib inline

# Kaggle房价的train数据 df = pd.read_csv("./data/houseprice.csv") df.head()

	Id	MSSubClass	MSZoning	LotFrontage	LotArea	Street	Alley	LotShape	LandContour	Utilities	...	PoolQC	Fence	MiscFeature	MoSold	YrSold	SaleType	SaleCondition	SalePrice
0	1	60	RL	65.0	8450	Pave	NaN	Reg	Lvl	AllPub	...	NaN	NaN	NaN	2	2008	WD	Normal	208500
1	2	20	RL	80.0	9600	Pave	NaN	Reg	Lvl	AllPub	...	NaN	NaN	NaN	5	2007	WD	Normal	181500
2	3	60	RL	68.0	11250	Pave	NaN	IR1	Lvl	AllPub	...	NaN	NaN	NaN	9	2008	WD	Normal	223500
3	4	70	RL	60.0	9550	Pave	NaN	IR1	Lvl	AllPub	...	NaN	NaN	NaN	2	2006	WD	Abnorml	140000
4	5	60	RL	84.0	14260	Pave	NaN	IR1	Lvl	AllPub	...	NaN	NaN	NaN	12	2008	WD	Normal	250000

5 rows × 81 columns

import numpy as np # 处理极端值 train = df[df['GarageArea'] < 1200] # 处理缺失值:对于数值形式的数据,先用默认interpolate()进行插值,再删除那些有NaN的行 train = train.select_dtypes(include=[np.number]).interpolate().dropna() train.head()

	Id	MSSubClass	LotFrontage	LotArea	OverallQual	OverallCond	YearBuilt	YearRemodAdd	MasVnrArea	BsmtFinSF1	...	WoodDeckSF	OpenPorchSF	EnclosedPorch	MoSold	YrSold	SalePrice
0	1	60	65.0	8450	7	5	2003	2003	196.0	706	...	0	61	0	2	2008	208500
1	2	20	80.0	9600	6	8	1976	1976	0.0	978	...	298	0	0	5	2007	181500
2	3	60	68.0	11250	7	5	2001	2002	162.0	486	...	0	42	0	9	2008	223500
3	4	70	60.0	9550	7	5	1915	1970	0.0	216	...	0	35	272	2	2006	140000
4	5	60	84.0	14260	8	5	2000	2000	350.0	655	...	192	84	0	12	2008	250000

5 rows × 38 columns

# 取出特征和预测值 x = train.iloc[:,1:37] y = train.iloc[:,-1]

import keras # 初始化model model = keras.Sequential()

Using TensorFlow backend.

# 添加全连接层(输出维度是1,输入维度是36) from keras import layers model.add(layers.Dense(1, input_dim=36))

WARNING:tensorflow:From E:\MyProgram\Anaconda\envs\krs\lib\site-packages\tensorflow\python\framework\op_def_library.py:263: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version. Instructions for updating: Colocations handled automatically by placer.

model.summary()

_________________________________________________________________ Layer (type)                 Output Shape              Param #    ================================================================= dense_1 (Dense)              (None, 1)                 37         ================================================================= Total params: 37 Trainable params: 37 Non-trainable params: 0 _________________________________________________________________

# 编译model,指明优化器和优化目标 model.compile(     optimizer='adam',     loss='mse' )

# 训练模型 model.fit(x, y, epochs=3000, verbose=0)

WARNING:tensorflow:From E:\MyProgram\Anaconda\envs\krs\lib\site-packages\tensorflow\python\ops\math_ops.py:3066: to_int32 (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version. Instructions for updating: Use tf.cast instead.      <keras.callbacks.History at 0x13bf6780>

model.predict(x)

array([[200148.9  ],        [188514.89 ],        [205523.38 ],        ...,        [218126.67 ],        [115857.586],        [191505.69 ]], dtype=float32)

y.head()

0    208500 1    181500 2    223500 3    140000 4    250000 Name: SalePrice, dtype: int64

文章来源: https://blog.csdn.net/SHU15121856/article/details/89286220

标签

线性