data-science

Pandas number of consecutive occurrences in previous rows

爷,独闯天下 提交于 2019-12-24 11:49:12
问题 I have OHLC data. The candle can be either 'green' (if the close is above open) or 'red' (if the open is above the close). The format is: open close candletype 0 542 543 GREEN 1 543 544 GREEN 2 544 545 GREEN 3 545 546 GREEN 4 546 547 GREEN 5 547 542 RED 6 542 543 GREEN What I would like is to count the number of consecutive green or red candles for n-previous rows. Lets say I want to identify rows preceded by 3 green candles. That the desired output would be: open close candletype pattern 0

Cannot retrieve Datasets in PyTables using natural naming

半腔热情 提交于 2019-12-24 11:31:57
问题 I'm new in PyTables and I want to retrieve a dataset from a HDF5 using natural naming but I'm getting this error using this input: f = tables.open_file("filename.h5", "r") f.root.group-1.dataset-1.read() group / does not have a child named group and if I try: f.root.group\-1.dataset\-1.read() group / does not have a child named group unexpected character after line continuation character I can't change names in the groups because is big data from an experiment. 回答1: You can't use the minus

Error in mutate_impl(.data, dots) using “join” code

*爱你&永不变心* 提交于 2019-12-24 09:44:52
问题 I have a dataset with 100000 rows where order_date shows the order date and user_id where shows the user's ID. I am trying to create a new variable that shows the user's total order within the same day. My data is like this: order_date=structure(c(15587, 15647, 15734, 15560, 15599, 15778, 15708, 15520, 15592, 15447, 15718, 15787, 15519, 15486, 15514, 15784, 15619, 15705, 15552, 15734, 15493, 15661, 15563, 15600, 15790, 15485, 15546, 15767, 15704, 15726), class = "Date") user_id=c(22607, 28275

Fastest way to eliminate specific dates from pandas dataframe

旧城冷巷雨未停 提交于 2019-12-24 08:13:24
问题 I'm working with a large data frame and I'm struggling to find an efficient way to eliminate specific dates. Note that I'm trying to eliminate any measurements from a specific date . Pandas has this great function, where you can call: df.ix['2016-04-22'] and pull all rows from that day. But what if I want to eliminate all rows from '2016-04-22'? I want a function like this: df.ix[~'2016-04-22'] (but that doesn't work) Also, what if I want to eliminate a list of dates? Right now, I have the

Create conditional column for Date Difference based on matching values in two columns

天涯浪子 提交于 2019-12-24 07:26:30
问题 I have a dataframe, I am struggling to create a column based out of other columns, I will share the problem for a sample data. Date Target1 Close 0 2019-04-17 209.2440 203.130005 1 2019-04-17 212.2155 203.130005 2 2019-04-17 213.6330 203.130005 3 2019-04-17 213.0555 203.130005 4 2019-04-17 212.6250 203.130005 5 2019-04-17 212.9820 203.130005 6 2019-04-17 213.1395 203.130005 7 2019-04-16 209.2860 199.250000 8 2019-04-16 209.9055 199.250000 9 2019-04-16 210.3045 199.250000 I want to create

How to calculate the steepness of a trend in python

佐手、 提交于 2019-12-24 06:49:57
问题 I am using the regression slope as follows to calculate the steepness (slope) of the trend. Scenario 1: For example, consider I am using sales figures (x-axis: 1, 4, 6, 8, 10, 15 ) for 6 days (y-axis). from sklearn.linear_model import LinearRegression regressor = LinearRegression() X = [[1], [4], [6], [8], [10], [15]] y = [1, 2, 3, 4, 5, 6] regressor.fit(X, y) print(regressor.coef_) This gives me 0.37709497 Scenario 2: When I run the same program for a different sale figure (e.g., 1, 2, 3, 4,

Is my training data set too complex for my neural network?

半城伤御伤魂 提交于 2019-12-24 03:35:09
问题 I am new to machine learning and stack overflow, I am trying to interpret two graphs from my regression model. Training error and Validation error from my machine learning model my case is similar to this guy Very large loss values when training multiple regression model in Keras but my MSE and RMSE are very high. Is my modeling underfitting? if yes what can I do to solve this problem? Here is my neural network I used for solving a regression problem def build_model(): model = keras

Specific postgresql server configuration for data analysis purposes

ε祈祈猫儿з 提交于 2019-12-23 19:35:09
问题 Is there any tips on tuning server's performance using postgresql.conf file in case you use a postgresql database specifically for data science department and data analysis purposes? Or performance tuning itself is purpose-agnostic and there is no real difference what you will do with it since 'it is all about extracting data'? It's a rather obscure question i didn't find an answer for (in miriads of articles on data science topic). 回答1: Though this is a very general question, I'll try my

How to load an excel sheet and clean the data in python?

本小妞迷上赌 提交于 2019-12-23 06:16:46
问题 Load the energy data from the file Energy Indicators.xls, which is a list of indicators of energy supply and renewable electricity production from the United Nations for the year 2013, and should be put into a DataFrame with the variable name of energy. Keep in mind that this is an Excel file, and not a comma separated values file. Also, make sure to exclude the footer and header information from the datafile. The first two columns are unneccessary, so you should get rid of them, and you

How to load an excel sheet and clean the data in python?

…衆ロ難τιáo~ 提交于 2019-12-23 06:15:40
问题 Load the energy data from the file Energy Indicators.xls, which is a list of indicators of energy supply and renewable electricity production from the United Nations for the year 2013, and should be put into a DataFrame with the variable name of energy. Keep in mind that this is an Excel file, and not a comma separated values file. Also, make sure to exclude the footer and header information from the datafile. The first two columns are unneccessary, so you should get rid of them, and you