python实现线性回归

定义：线性回归在假设特征满足线性关系，根据给定的训练数据训练一个模型，并用此模型进行预测。

从最简单的一元线性关系介绍,假设有一组数据型态为 y=theta * x，其中 x = {0, 1, 2, 3, 4, 5}, y = {0, 20, 60, 68, 77, 110}。我们根据 x, y 模拟出近似的 theta 参数值，进而得到 y=theta * x 模型，y=theta * x 可以预测出 y。

线性回归的经典表达式就是下图的 Y 式子，在 Y 表达式中 theta 的取值有很多种，没有限定的条件下是不确定的，这时就需要使用损失函数来求解最佳的 theta 值。

通过损失函数求解 theta 值常用有两种方式：最小二乘法 and 梯度下降法。

使用最小二乘法可以直接求出 theta 值，如下图参数 theta 计算结果。但是在计算过程中会耗费大量的CPU等资源。

代码中的 data.csv 数据及源码：https://gitee.com/wangfuchao/linear_regression.git

import numpy as np
from numpy.linalg import inv
from numpy import dot
from numpy import mat

import pandas as pd

dataSet = pd.read_csv('D:\projects\\ai\linear_regression\linear_regression\data.csv')

# print(dataSet)

temp = dataSet.iloc[:, 2:5]
temp['x0'] = 1
x = temp.iloc[:, [3, 0, 1, 2]]
# print(x)
y = dataSet.iloc[:, 1].values.reshape(150, 1)

print(theta)

theta = np.array([1., 1., 1., 1.]).reshape(4, 1)
alpha = 0.1
temp = theta
x0 = x.iloc[:, 0].values.reshape(150, 1)
x1 = x.iloc[:, 1].values.reshape(150, 1)
x2 = x.iloc[:, 2].values.reshape(150, 1)
x3 = x.iloc[:, 3].values.reshape(150, 1)

for i in range(10000):

文章来源: python实现线性回归

标签

线性回归