statistics

python statsmodels linear regression

早过忘川 提交于 2019-12-24 06:23:14
问题 I am attempting to make a linear regression model based on pre project data and ultimately attempt to calculate some modeled data where I could compare pre/post project data... Can anyone tell me what the best proactice is else I maybe off in the weeds somewhere... For starters: import statsmodels.api as sm import numpy as np import pandas as pd import matplotlib.pyplot as plt ng = pd.read_csv('C:/Users/ngDataBaseline.csv', thousands=',', index_col='Date', parse_dates=True) ng.head() This

python statsmodels linear regression

混江龙づ霸主 提交于 2019-12-24 06:22:48
问题 I am attempting to make a linear regression model based on pre project data and ultimately attempt to calculate some modeled data where I could compare pre/post project data... Can anyone tell me what the best proactice is else I maybe off in the weeds somewhere... For starters: import statsmodels.api as sm import numpy as np import pandas as pd import matplotlib.pyplot as plt ng = pd.read_csv('C:/Users/ngDataBaseline.csv', thousands=',', index_col='Date', parse_dates=True) ng.head() This

mapping between words and a group tuple to get frequency of words

我们两清 提交于 2019-12-24 04:24:12
问题 I have a dataframe that looks like the following Utterance Frequency Directions to Starbucks 1045 Show me directions to Starbucks 754 Give me directions to Starbucks 612 Navigate me to Starbucks 498 Display navigation to Starbucks 376 Direct me to Starbucks 201 Navigate to Starbucks 180 Here, there is some data that show utterances made by people, and how frequently these were said. I.e., "Directions to Starbucks" was uttered 1045 times, "Show me directions to Starbucks" was uttered 754 times

Determining High Density Region for a distribution in R

|▌冷眼眸甩不掉的悲伤 提交于 2019-12-24 04:13:08
问题 Background: Normally, R gives quantiles for well-known distributions. Out of these quantiles, the lower 2.5% up to the upper 97.5% covers 95% of the area under these distributions. Question: Suppose I have a F distribution (df1 = 10, df2 = 90). In R, how can I determine the 95% of the area under this distribution such that this 95% only covers the HIGH DENSITY area, not the 95% that R normally gives (see my R code Below )? Note: Clearly, the highest density is the "mode" (dashed line in the

Matlab: “grouping mean”

拥有回忆 提交于 2019-12-24 03:24:24
问题 Suppose I have the vectors: y = [1 1.01 1.02 1.03 2 2.01 2.02 3 3.01 3.02 3.03]; c = [0 0 0 0 1 1 1 2 2 2 2 ]; Is there a vectorized way to get a "grouping mean", that is, the mean value of y for each unique value of c ? (This is a simplified example; I have something similar but the vector size is in the thousands and there are hundreds of values of c) I can do it in a for-loop, just wondering if it could be vectorized. Here's my for-loop implementation: function [my,mc] = groupmean(y,c) my

Why does Google Analytics show less visits than One&One stats?

给你一囗甜甜゛ 提交于 2019-12-24 02:33:33
问题 Comparing google analytics results to one&one hosting monthly statics shows a huge discrepancy. For last month: Google shows 1046 visits. One&one stats show 15304 unique visits. The google code is in the footer which appears on every page. I'm aware ga only works with js enabled but to assume that many non js users??? 回答1: Google Analytics is a good indicator of how many humans are visiting your website. Here are some things to check: how many bots are in your monthly stats? You can usually

Track multiple moving averages with Apache Commons Math DescriptiveStatistics

风格不统一 提交于 2019-12-24 01:55:15
问题 I am using DescriptiveStatistics to track the moving average of some metrics. I have a thread that submits the metric value every minute, and I track the 10 minute moving average of the metric by using the setWindowSize(10) method on DescriptiveStatistics. This works fine for tracking a single moving average but I actually need to track multiple moving averages, i.e. the 1 minute average, the 5 minute average, and the 10 minute average. Currently I have the following options: Have 3 different

Track multiple moving averages with Apache Commons Math DescriptiveStatistics

孤人 提交于 2019-12-24 01:55:10
问题 I am using DescriptiveStatistics to track the moving average of some metrics. I have a thread that submits the metric value every minute, and I track the 10 minute moving average of the metric by using the setWindowSize(10) method on DescriptiveStatistics. This works fine for tracking a single moving average but I actually need to track multiple moving averages, i.e. the 1 minute average, the 5 minute average, and the 10 minute average. Currently I have the following options: Have 3 different

Calculating Median Absolute Deviation in C#

纵然是瞬间 提交于 2019-12-24 01:39:06
问题 I am required to perform a number of statistical calculations on a number set and one of the things I need to calculate is the Median Absolute Deviation. I was supplied with an ISO standard and all it tells me is I have no idea what to do with that info as I do not have any statistical math training. As such, I can't translate the above into a C# function. 回答1: Median is a middl e element of the sorted array (or average of two middle items if the array has even items): double[] source = new

Find only relevant points in MATLAB

余生长醉 提交于 2019-12-24 01:39:04
问题 I have a MATLAB function that finds charateristic points in a sample. Unfortunatley it only works about 90% of the time. But when I know at which places in the sample I am supposed to look I can increase this to almost 100%. So I would like to know if there is a function in MATLAB that would allow me to find the range where most of my results are, so I can then recalculate my characteristic points. I have a vector which stores all the results and the right results should lie inside a range of