Simple multidimensional curve fitting

前端 未结 6 1928
天涯浪人
天涯浪人 2020-12-24 08:52

I have a bunch of data, generally in the form a, b, c, ..., y

where y = f(a, b, c...)

Most of them are three and four variables, and have 10k - 10M records.

6条回答
  •  忘掉有多难
    2020-12-24 09:46

    Short Answer: it isn't so simple. Consider a non-parametric approach on data sub-sets.

    There are 2 main issues you need to decide about (1) Do you actually care about the parameters of the function, i.e. your P1, E1, ..., or would you be okay with just estimating the mean function (2) do you really need to estimate the function on all of the data?

    The first thing I'll mention is that your specified function is non-linear (in the parameters to be estimated), so ordinary least squares won't work. Let's pretend that you specified a linear function. You'd still have a problem with the 10M values. Linear regression can be performed in an efficient way using QR factorization, but you are still left with an O(p * n^2) algorithm, where p is the number of parameters you are trying to estimate. If you want to estimate the non-linear mean function it gets much worse.

    The only way you are going to be able to estimate anything in such a large data set is by using a subset to perform the estimation. Basically, you randomly select a subset and use that to estimate the function.

    If you don't care about your parameter values, and just want to estimate the mean function you will probably be better off using a non-parametric estimation technique.

    Hopefully this helps.

    leif

提交回复
热议问题