regression

Using Scipy curve_fit with variable number of parameters to optimize

落爺英雄遲暮 提交于 2019-12-02 07:42:28
问题 Assuming we have the below function to optimize for 4 parameters, we have to write the function as below, but if we want the same function with more number of parameters, we have to rewrite the function definition. def radius (z,a0,a1,k0,k1,): k = np.array([k0,k1,]) a = np.array([a0,a1,]) w = 1.0 phi = 0.0 rs = r0 + np.sum(a*np.sin(k*z +w*t +phi), axis=1) return rs The question is if this can be done easier in a more automatic way, and more intuitive than this question suggests. example would

`nls` fails to estimate parameters of my model

守給你的承諾、 提交于 2019-12-02 06:15:19
I am trying to estimate the constants for Heaps law. I have the following dataset novels_colection : Number of novels DistinctWords WordOccurrences 1 1 13575 117795 2 1 34224 947652 3 1 40353 1146953 4 1 55392 1661664 5 1 60656 1968274 Then I build the next function: # Function for Heaps law heaps <- function(K, n, B){ K*n^B } heaps(2,117795,.7) #Just to test it works So n = Word Occurrences , and K and B are values that should be constants in order to find my prediction of Distinct Words. I tried this but it gives me an error: fitHeaps <- nls(DistinctWords ~ heaps(K,WordOccurrences,B), data =

Coding and Paper Letter(一)

随声附和 提交于 2019-12-02 06:13:15
最近发现需要在快速阅读背景下,对快餐式资源做整理与收集。以Coding(以Github)和Paper(自己看到的一些论文,论文一般主要看题目和摘要做些简单小结)的资源为主。 1 Coding: 1.QGIS上的变形地图插件,我后面会专门来介绍变形地图这个主题的内容。 qgis-cartogram源码 2.火星坐标与地球坐标转换开源代码。 命令行版 Python版 项目与说明 3.空间统计开源软件GeoDa资源。 GeoDa 源码 4.空间统计分析开源Python库——PySAL。 PySAL GitHub 5.GIS资源链接整理。 Awesome GIS 6.R语言包(rasterVIS)。一个专门针对栅格做可视化的包。十分强大。 rasterVis GitHub 7.基于CityEngine开发的地理设计工具箱。这个项目讨论了一系列工具,这些工具旨在使数据驱动设计能够支持大规模方案规划项目。这些工具旨在集成GIS和CityEngine,以支持创建大量3D内容,以支持城市规划/地理设计项目。创建的内容可用于创建图像作为剪切图纸的一部分(与数据驱动页面一起使用),或链接到Web地图中的Web内容(通过提供弹出窗口或Web场景链接到的内容)。这里提出的工作流程的重点是街道,但脚本也支持与建筑物/批次/分区可视化相关的项目。意图:这些工具的目的是通过结合使用GIS和CityEngine

Finding a point that best fits the intersection of n spheres

ε祈祈猫儿з 提交于 2019-12-02 06:05:13
I have an array of points with distances. I wish to find a point that best satisfies the condition that for (point_i, distance_i) in pointArray: abs(point - point_i) = distance_i I think this could be solved with some kind of regression or least squares, but I'm having trouble with the problem formulation. If anyone could help out, it would be greatly appreciated You need to define "best" to have an answerable question. What you probably want to do is define some sort of error function for how much being off from a given point matters, and then try to minimize the sum of the errors. The error

How to get the prediction of test from 2D parameters of WLS regression in statsmodels

坚强是说给别人听的谎言 提交于 2019-12-02 05:51:49
I'm incrementally up the parameters of WLS regression functions using statsmodels. I have a 10x3 dataset X that I declared like this: X = np.array([[1,2,3],[1,2,3],[4,5,6],[1,2,3],[4,5,6],[1,2,3],[1,2,3],[4,5,6],[4,5,6],[1,2,3]]) This is my dataset, and I have a 10x2 endog vector that looks like this: z = [[ 3.90311860e-322 2.00000000e+000] [ 0.00000000e+000 2.00000000e+000] [ 0.00000000e+000 -2.00000000e+000] [ 0.00000000e+000 2.00000000e+000] [ 0.00000000e+000 -2.00000000e+000] [ 0.00000000e+000 2.00000000e+000] [ 0.00000000e+000 2.00000000e+000] [ 0.00000000e+000 -2.00000000e+000] [ 0

Stata: combining coefficients/standard errors from several regressions in a single dataset (number of variables may differ)

生来就可爱ヽ(ⅴ<●) 提交于 2019-12-02 05:33:17
I have already asked a question about storing coefficients and standard errors of several regressions in a single dataset. Let me just reiterate the objective of my initial question: I would like to run several regressions and store their results in a DTA file that I could later use for analysis. My constraints are: I cannot install modules (I am writing code for other people and not sure what modules they have installed) Some of the regressors are factor variables. Each regression differ only by the dependent variable, so I would like to store that in the final dataset to keep track of what

Get p-value for group mean difference without refitting linear model with a new reference level

孤街浪徒 提交于 2019-12-02 05:30:34
When we have a linear model with a factor variable X (with levels A , B , and C ) y ~ factor(X) + Var2 + Var3 The result shows the estimate XB and XC which is differences B - A and C - A . (suppose that the reference is A ). If we want to know the p-value of the difference between B and C : C - B , we should designate B or C as a reference group and re-run the model. Can we get the p-values of the effect B - A , C - A , and C - B at one time? 李哲源 You are looking for linear hypothesis test by check p-value of some linear combination of regression coefficients. Based on my answer: How to conduct

Coefficient table does not have NA rows in rank-deficient fit; how to insert them?

吃可爱长大的小学妹 提交于 2019-12-02 05:16:53
问题 library(lmPerm) x <- lmp(formula = a ~ b * c + d + e, data = df, perm = "Prob") summary(x) # truncated output, I can see `NA` rows here! #Coefficients: (1 not defined because of singularities) # Estimate Iter Pr(Prob) #b 5.874 51 1.000 #c -30.060 281 0.263 #b:c NA NA NA #d1 -31.333 60 0.633 #d2 33.297 165 0.382 #d3 -19.096 51 1.000 #e 1.976 NA NA I want to pull out the Pr(Prob) results for everything, but y <- summary(x)$coef[, "Pr(Prob)"] #(Intercept) b c d1 d2 # 0.09459459 1.00000000 0

How to specify covariates in a regression model

会有一股神秘感。 提交于 2019-12-02 05:01:35
问题 The dataset I would like to analyse looks like this n <- 4000 tmp <- t(replicate(n, sample(49,6))) dat <- matrix(0, nrow=n, ncol=49) colnames(dat) <- paste("p", 1:49, sep="") dat <- as.data.frame(dat) dat[, "win.frac"] <- rnorm(n, mean=0.0176504, sd=0.002) for (i in 1:nrow(dat)) for (j in 1:6) dat[i, paste("p", tmp[i, j], sep="")] <- 1 str(dat) Now I would like to perform a regression with depended variable win.frac and all other variables ( p1 , ..., p49 ) as explanatory variables. However,

Linear model singular because of large integer datetime in R?

核能气质少年 提交于 2019-12-02 04:22:38
Simple regression of random normal on date fails, but identical data with small integers instead of dates works as expected. # Example dataset with 100 observations at 2 second intervals. set.seed(1) df <- data.frame(x=as.POSIXct("2017-03-14 09:00:00") + seq(0, 199, 2), y=rnorm(100)) #> head(df) # x y # 1 2017-03-14 09:00:00 -0.6264538 # 2 2017-03-14 09:00:02 0.1836433 # 3 2017-03-14 09:00:04 -0.8356286 # Simple regression model. m <- lm(y ~ x, data=df) The slope is missing due to singularities in the data. Calling the summary demonstrates this: summary(m) # Coefficients: (1 not defined