data-science | 易学教程

Pandas number of consecutive occurrences in previous rows

阅读更多关于 Pandas number of consecutive occurrences in previous rows

问题 I have OHLC data. The candle can be either 'green' (if the close is above open) or 'red' (if the open is above the close). The format is: open close candletype 0 542 543 GREEN 1 543 544 GREEN 2 544 545 GREEN 3 545 546 GREEN 4 546 547 GREEN 5 547 542 RED 6 542 543 GREEN What I would like is to count the number of consecutive green or red candles for n-previous rows. Lets say I want to identify rows preceded by 3 green candles. That the desired output would be: open close candletype pattern 0

Cannot retrieve Datasets in PyTables using natural naming

阅读更多关于 Cannot retrieve Datasets in PyTables using natural naming

问题 I'm new in PyTables and I want to retrieve a dataset from a HDF5 using natural naming but I'm getting this error using this input: f = tables.open_file("filename.h5", "r") f.root.group-1.dataset-1.read() group / does not have a child named group and if I try: f.root.group\-1.dataset\-1.read() group / does not have a child named group unexpected character after line continuation character I can't change names in the groups because is big data from an experiment. 回答1: You can't use the minus

Error in mutate_impl(.data, dots) using “join” code

阅读更多关于 Error in mutate_impl(.data, dots) using “join” code

问题 I have a dataset with 100000 rows where order_date shows the order date and user_id where shows the user's ID. I am trying to create a new variable that shows the user's total order within the same day. My data is like this: order_date=structure(c(15587, 15647, 15734, 15560, 15599, 15778, 15708, 15520, 15592, 15447, 15718, 15787, 15519, 15486, 15514, 15784, 15619, 15705, 15552, 15734, 15493, 15661, 15563, 15600, 15790, 15485, 15546, 15767, 15704, 15726), class = "Date") user_id=c(22607, 28275

Fastest way to eliminate specific dates from pandas dataframe

阅读更多关于 Fastest way to eliminate specific dates from pandas dataframe

问题 I'm working with a large data frame and I'm struggling to find an efficient way to eliminate specific dates. Note that I'm trying to eliminate any measurements from a specific date . Pandas has this great function, where you can call: df.ix['2016-04-22'] and pull all rows from that day. But what if I want to eliminate all rows from '2016-04-22'? I want a function like this: df.ix[~'2016-04-22'] (but that doesn't work) Also, what if I want to eliminate a list of dates? Right now, I have the

Create conditional column for Date Difference based on matching values in two columns

阅读更多关于 Create conditional column for Date Difference based on matching values in two columns

问题 I have a dataframe, I am struggling to create a column based out of other columns, I will share the problem for a sample data. Date Target1 Close 0 2019-04-17 209.2440 203.130005 1 2019-04-17 212.2155 203.130005 2 2019-04-17 213.6330 203.130005 3 2019-04-17 213.0555 203.130005 4 2019-04-17 212.6250 203.130005 5 2019-04-17 212.9820 203.130005 6 2019-04-17 213.1395 203.130005 7 2019-04-16 209.2860 199.250000 8 2019-04-16 209.9055 199.250000 9 2019-04-16 210.3045 199.250000 I want to create

How to calculate the steepness of a trend in python

阅读更多关于 How to calculate the steepness of a trend in python

问题 I am using the regression slope as follows to calculate the steepness (slope) of the trend. Scenario 1: For example, consider I am using sales figures (x-axis: 1, 4, 6, 8, 10, 15 ) for 6 days (y-axis). from sklearn.linear_model import LinearRegression regressor = LinearRegression() X = [[1], [4], [6], [8], [10], [15]] y = [1, 2, 3, 4, 5, 6] regressor.fit(X, y) print(regressor.coef_) This gives me 0.37709497 Scenario 2: When I run the same program for a different sale figure (e.g., 1, 2, 3, 4,

Is my training data set too complex for my neural network?

阅读更多关于 Is my training data set too complex for my neural network?

问题 I am new to machine learning and stack overflow, I am trying to interpret two graphs from my regression model. Training error and Validation error from my machine learning model my case is similar to this guy Very large loss values when training multiple regression model in Keras but my MSE and RMSE are very high. Is my modeling underfitting? if yes what can I do to solve this problem? Here is my neural network I used for solving a regression problem def build_model(): model = keras

Specific postgresql server configuration for data analysis purposes

阅读更多关于 Specific postgresql server configuration for data analysis purposes

问题 Is there any tips on tuning server's performance using postgresql.conf file in case you use a postgresql database specifically for data science department and data analysis purposes? Or performance tuning itself is purpose-agnostic and there is no real difference what you will do with it since 'it is all about extracting data'? It's a rather obscure question i didn't find an answer for (in miriads of articles on data science topic). 回答1: Though this is a very general question, I'll try my

How to load an excel sheet and clean the data in python?

阅读更多关于 How to load an excel sheet and clean the data in python?

问题 Load the energy data from the file Energy Indicators.xls, which is a list of indicators of energy supply and renewable electricity production from the United Nations for the year 2013, and should be put into a DataFrame with the variable name of energy. Keep in mind that this is an Excel file, and not a comma separated values file. Also, make sure to exclude the footer and header information from the datafile. The first two columns are unneccessary, so you should get rid of them, and you

How to load an excel sheet and clean the data in python?

阅读更多关于 How to load an excel sheet and clean the data in python?