pandas

Heiken Ashi Using pandas python

我怕爱的太早我们不能终老 提交于 2021-02-17 09:06:38
问题 I was defining a function Heiken Ashi which is one of the popular chart type in Technical Analysis. I was writing a function on it using Pandas but finding little difficulty. This is how Heiken Ashi [HA] looks like- Heikin-Ashi Candle Calculations HA_Close = (Open + High + Low + Close) / 4 HA_Open = (previous HA_Open + previous HA_Close) / 2 HA_Low = minimum of Low, HA_Open, and HA_Close HA_High = maximum of High, HA_Open, and HA_Close Heikin-Ashi Calculations on First Run HA_Close = (Open +

Print predict ValueError: Expected 2D array, got 1D array instead

ぃ、小莉子 提交于 2021-02-17 07:19:29
问题 The error shows in my last two codes. ValueError: Expected 2D array, got 1D array instead: array=[0 1]. Reshape your data either using array.reshape(-1, 1) if your data has a single feature or array.reshape(1, -1) if it contains a single sample. import numpy as np import pandas as pd from sklearn.model_selection import ShuffleSplit %matplotlib inline df = pd.read_csv('.......csv') df.drop(['Company'], 1, inplace=True) x = pd.DataFrame(df.drop(['R&D Expense'],1)) y = pd.DataFrame(df['R&D

Get column name based on condition in pandas

[亡魂溺海] 提交于 2021-02-17 07:18:07
问题 I have a dataframe as below: I want to get the name of the column if column of a particular row if it contains 1 in the that column. e.g. For Row 1: Blanks, For Row 2: Manufacturing, For Row 3: Manufacturing, For Row 4: Manufacturing, For Row 5: Social, Finance, Analytics, Advertising, Right now I am able to get the complete row only: primary_sectors = lambda primary_sector: sectors[ sectors["category_list"] == primary_sector ] Please help me to get the name of the column in the above

Python Pandas: How to convert my table from a long format to wide format (specific example below)?

我与影子孤独终老i 提交于 2021-02-17 07:09:49
问题 Pretty much the title. I am attaching the spreadsheet here. I need to convert "Input" sheet to "Output" sheet. I know about Pandas wide_to_long. But I haven't been able to use it to give the desired output, the rows get scrambled up in the output. import pandas as pd df=pd.read_excel('../../Downloads/test.xlsx',sheet_name='Input', header=0) newdf=pd.wide_to_long(df, [str(i) for i in range(2022,2028)], 'Hotel Name', 'value', sep='', suffix='.+')\ .reset_index()\ .sort_values('Hotel Name')\

getting ratio by iterating over two columns

此生再无相见时 提交于 2021-02-17 07:04:07
问题 Hi my data frame is as below Date Key y 1/2/2013 A 1 1/2/2013 B 2 1/2/2013 C 1 2/2/2013 A 1 2/2/2013 c 1 2/2/2013 B 3 I now want to create a new column "ratio" which is for a given date(1/2/2013), ratio of key A would be y(A)/(y(A)+y(B)+y(C)) which is 1/(1+2+1) i.e 0.25. My final df would be as follows Date Key y ratio 1/2/2013 A 1 0.25 1/2/2013 B 2 0.5 1/2/2013 C 1 0.25 2/2/2013 A 1 0.2 2/2/2013 c 1 0.2 2/2/2013 B 3 0.6 really appreciate the help 回答1: You can use groupby().transform('sum')

Converting series from pandas to pyspark: need to use “groupby” and “size”, but pyspark yields error

不打扰是莪最后的温柔 提交于 2021-02-17 07:03:31
问题 I am converting some code from Pandas to pyspark. In pandas, lets imagine I have the following mock dataframe, df: And in pandas, I define a certain variable the following way: value = df.groupby(["Age", "Siblings"]).size() And the output is a series as follows: However, when trying to covert this to pyspark, an error comes up: AttributeError: 'GroupedData' object has no attribute 'size' . Can anyone help me solve this? 回答1: The equivalent of size in pyspark is count: df.groupby(["Age",

Window of full weeks in pandas

旧时模样 提交于 2021-02-17 06:37:09
问题 I am looking for a special window function in pandas: sort of a combination of rolling and expanding. For calculating (for instance) the mean and standard deviating, I want to regard all past data, but ignore the first few records to make sure I have a multiple of 7 (days in my case). That's because I know the data has a strong weekly pattern. Example: s = pd.Series([1, 3, 4, 5, 4, 3, 1, 2, 4, 5, 4, 5, 4, 2, 1, 3, 4, 5, 4, 3, 1, 3], pd.date_range('2020-01-01', '2020-01-22')) s.rolling(7, 7)

How to replace certain values in pandas Series with its previous value?

戏子无情 提交于 2021-02-17 06:08:42
问题 I have a pandas Series object s like this: >>> s date 2020-03-26 19.72 2020-03-27 19.75 2020-03-30 19.43 2020-03-31 19.69 2020-04-01 -- 2020-04-06 20.03 2020-04-07 20.45 2020-04-08 21.00 2020-04-09 -- 2020-04-10 20.96 2020-04-13 20.75 2020-04-14 21.23 Name: price, dtype: object >>> s.values array(['19.72', '19.75', '19.43', '19.69', '--', '20.03', '20.45', '21.00', '20.82', '20.96', '20.75', '21.23'], dtype=object) How can I replace -- with its previous value? I mean I want s to be converted

pandas update specific rows in specific columns in one dataframe based on another dataframe

时间秒杀一切 提交于 2021-02-17 06:05:53
问题 I have two dataframes, Big and Small, and I want to update Big based on the data in Small, only in specific columns. this is Big: >>> ID name country city hobby age 0 12 Meli Peru Lima eating 212 1 15 Saya USA new-york drinking 34 2 34 Aitel Jordan Amman riding 51 3 23 Tanya Russia Moscow sports 75 4 44 Gil Spain Madrid paella 743 and this is small: >>>ID name country city hobby age 0 12 Melinda Peru Lima eating 24 4 44 Gil Spain Barcelona friends 21 I would like to update the rows in Big

pandas update specific rows in specific columns in one dataframe based on another dataframe

你说的曾经没有我的故事 提交于 2021-02-17 06:04:40
问题 I have two dataframes, Big and Small, and I want to update Big based on the data in Small, only in specific columns. this is Big: >>> ID name country city hobby age 0 12 Meli Peru Lima eating 212 1 15 Saya USA new-york drinking 34 2 34 Aitel Jordan Amman riding 51 3 23 Tanya Russia Moscow sports 75 4 44 Gil Spain Madrid paella 743 and this is small: >>>ID name country city hobby age 0 12 Melinda Peru Lima eating 24 4 44 Gil Spain Barcelona friends 21 I would like to update the rows in Big