time-series

how to convert a very large dataset to xts? - as.xts fails on 1.5M rows

别来无恙 提交于 2021-02-10 18:41:49
问题 I have the data: dput(head(data)) > dput(head(data)) structure(list(Gmt.time = c("01.06.2015 00:00", "01.06.2015 00:01", "01.06.2015 00:02", "01.06.2015 00:03", "01.06.2015 00:04", "01.06.2015 00:05" ), Open = c(0.88312, 0.88337, 0.88377, 0.88412, 0.88393, 0.8838 ), High = c(0.88337, 0.88378, 0.88418, 0.88418, 0.88393, 0.88393 ), Low = c(0.883, 0.88337, 0.88374, 0.88394, 0.88368, 0.88362 ), Close = c(0.88337, 0.88375, 0.88412, 0.88394, 0.8838, 0.88393 ), Volume = c(83.27, 100.14, 117.18, 52

Tensorflow keras timeseries prediction with X and y having different shapes

不羁的心 提交于 2021-02-10 15:44:10
问题 I am trying to do time series prediction with tensorflow and keras with X and y having different dimensions: X.shape = (5000, 12) y.shape = (5000, 3, 12) When I do the following n_input = 7 generator = TimeseriesGenerator(X, y, length=n_input, batch_size=1) for i in range(5): x_, y_ = generator[i] print(x_.shape) print(y_.shape) I get as desired the output (1, 7, 12) (1, 3, 12) (1, 7, 12) (1, 3, 12) ... This is because my data is meteorological, I have 5000 days, for training in the array X I

Tensorflow keras timeseries prediction with X and y having different shapes

半世苍凉 提交于 2021-02-10 15:43:05
问题 I am trying to do time series prediction with tensorflow and keras with X and y having different dimensions: X.shape = (5000, 12) y.shape = (5000, 3, 12) When I do the following n_input = 7 generator = TimeseriesGenerator(X, y, length=n_input, batch_size=1) for i in range(5): x_, y_ = generator[i] print(x_.shape) print(y_.shape) I get as desired the output (1, 7, 12) (1, 3, 12) (1, 7, 12) (1, 3, 12) ... This is because my data is meteorological, I have 5000 days, for training in the array X I

Pandas Time Series DataFrame Missing Values

半腔热情 提交于 2021-02-10 06:09:06
问题 I have a dataset of Total Sales from 2008-2015. I have an entry for each day, and so I have a created a pandas DataFrame with a DatetimeIndex and a column for sales. So it looks like this The problem is that I am missing data for most of 2010. These missing values are currently represented by 0.0 so if I plot the DataFrame I get I want to try forecast values for 2016, possibly using an ARIMA model, so the first step I took was to perform a decomposition of this time series Obviously if I

Pandas Time Series DataFrame Missing Values

[亡魂溺海] 提交于 2021-02-10 06:08:39
问题 I have a dataset of Total Sales from 2008-2015. I have an entry for each day, and so I have a created a pandas DataFrame with a DatetimeIndex and a column for sales. So it looks like this The problem is that I am missing data for most of 2010. These missing values are currently represented by 0.0 so if I plot the DataFrame I get I want to try forecast values for 2016, possibly using an ARIMA model, so the first step I took was to perform a decomposition of this time series Obviously if I

How to calculate p-values from cross-correlation function in R

风流意气都作罢 提交于 2021-02-10 05:20:40
问题 I calculated a cross-correlation of two time series using ccf() in R. I know how to derive the confidence limits as: ccf1 <- ccf(x=x,y=y,lag.max=5,na.action=na.pass, plot=F) upperCI <- qnorm((1+0.95)/2)/sqrt(ccf1$n.used) lowerCI <- -qnorm((1+0.95)/2)/sqrt(ccf1$n.used) But what I really need is the p-value of the maximum correlation. ind.max <- which(abs(ccf1$acf[1:11])==max(abs(ccf1$acf[1:11]))) max.cor <- ccf1$acf[ind.max] lag.opt <- ccf1$lag[ind.max] How do I calculate this p-value? I have

Groupby and resample timeseries so date ranges are consistent

萝らか妹 提交于 2021-02-09 10:55:23
问题 I have a dataframe which is basically several timeseries stacked on top of one another. Each time series has a unique label (group) and they have different date ranges. date = pd.to_datetime(pd.Series(['2010-01-01', '2010-01-02', '2010-01-03', '2010-01-06', '2010-01-01', '2010-01-03'])) group = [1,1,1,1, 2, 2] value = [1,2,3,4,5,6] df = pd.DataFrame({'date':date, 'group':group, 'value':value}) df date group value 0 2010-01-01 1 1 1 2010-01-02 1 2 2 2010-01-03 1 3 3 2010-01-06 1 4 4 2010-01-01

Groupby and resample timeseries so date ranges are consistent

岁酱吖の 提交于 2021-02-09 10:55:20
问题 I have a dataframe which is basically several timeseries stacked on top of one another. Each time series has a unique label (group) and they have different date ranges. date = pd.to_datetime(pd.Series(['2010-01-01', '2010-01-02', '2010-01-03', '2010-01-06', '2010-01-01', '2010-01-03'])) group = [1,1,1,1, 2, 2] value = [1,2,3,4,5,6] df = pd.DataFrame({'date':date, 'group':group, 'value':value}) df date group value 0 2010-01-01 1 1 1 2010-01-02 1 2 2 2010-01-03 1 3 3 2010-01-06 1 4 4 2010-01-01

Identify missing hours - find the gaps in time

强颜欢笑 提交于 2021-02-08 10:12:27
问题 I have a table with hours, but there are gaps. I need to find which are the missing hours. select datehour from stored_hours order by 1; The gaps in this timeline are easy to find: select lag(datehour) over(order by datehour) since, datehour until , timestampdiff(hour, lag(datehour) over(order by datehour), datehour) - 1 missing from stored_hours qualify missing > 0 How can I create a list of the missing hours during these days? (with Snowflake and SQL) 回答1: To create a list/table of the

GMM/EM on time series cluster

你。 提交于 2021-02-08 10:07:40
问题 According to a paper, it is supposed to work. But as a learner of scikit-learn package.. I do not see how. All the sample codes cluster by ellipses or circles as here. I would really like to know how to cluster the following plot by different patterns... 0 -3 are the mean of power over certain time periods (divided into 4) while 4, 5, 6 each correspond to standard deviation of the year, variance in weekday/weekend, variance in winter/summer. So the ylabel does not necessarily meet with 4,5,6.