pandas | 易学教程

Heiken Ashi Using pandas python

阅读更多关于 Heiken Ashi Using pandas python

问题 I was defining a function Heiken Ashi which is one of the popular chart type in Technical Analysis. I was writing a function on it using Pandas but finding little difficulty. This is how Heiken Ashi [HA] looks like- Heikin-Ashi Candle Calculations HA_Close = (Open + High + Low + Close) / 4 HA_Open = (previous HA_Open + previous HA_Close) / 2 HA_Low = minimum of Low, HA_Open, and HA_Close HA_High = maximum of High, HA_Open, and HA_Close Heikin-Ashi Calculations on First Run HA_Close = (Open +

Print predict ValueError: Expected 2D array, got 1D array instead

阅读更多关于 Print predict ValueError: Expected 2D array, got 1D array instead

问题 The error shows in my last two codes. ValueError: Expected 2D array, got 1D array instead: array=[0 1]. Reshape your data either using array.reshape(-1, 1) if your data has a single feature or array.reshape(1, -1) if it contains a single sample. import numpy as np import pandas as pd from sklearn.model_selection import ShuffleSplit %matplotlib inline df = pd.read_csv('.......csv') df.drop(['Company'], 1, inplace=True) x = pd.DataFrame(df.drop(['R&D Expense'],1)) y = pd.DataFrame(df['R&D

Get column name based on condition in pandas

阅读更多关于 Get column name based on condition in pandas

问题 I have a dataframe as below: I want to get the name of the column if column of a particular row if it contains 1 in the that column. e.g. For Row 1: Blanks, For Row 2: Manufacturing, For Row 3: Manufacturing, For Row 4: Manufacturing, For Row 5: Social, Finance, Analytics, Advertising, Right now I am able to get the complete row only: primary_sectors = lambda primary_sector: sectors[ sectors["category_list"] == primary_sector ] Please help me to get the name of the column in the above

Python Pandas: How to convert my table from a long format to wide format (specific example below)?

阅读更多关于 Python Pandas: How to convert my table from a long format to wide format (specific example below)?

问题 Pretty much the title. I am attaching the spreadsheet here. I need to convert "Input" sheet to "Output" sheet. I know about Pandas wide_to_long. But I haven't been able to use it to give the desired output, the rows get scrambled up in the output. import pandas as pd df=pd.read_excel('../../Downloads/test.xlsx',sheet_name='Input', header=0) newdf=pd.wide_to_long(df, [str(i) for i in range(2022,2028)], 'Hotel Name', 'value', sep='', suffix='.+')\ .reset_index()\ .sort_values('Hotel Name')\

getting ratio by iterating over two columns

阅读更多关于 getting ratio by iterating over two columns

问题 Hi my data frame is as below Date Key y 1/2/2013 A 1 1/2/2013 B 2 1/2/2013 C 1 2/2/2013 A 1 2/2/2013 c 1 2/2/2013 B 3 I now want to create a new column "ratio" which is for a given date(1/2/2013), ratio of key A would be y(A)/(y(A)+y(B)+y(C)) which is 1/(1+2+1) i.e 0.25. My final df would be as follows Date Key y ratio 1/2/2013 A 1 0.25 1/2/2013 B 2 0.5 1/2/2013 C 1 0.25 2/2/2013 A 1 0.2 2/2/2013 c 1 0.2 2/2/2013 B 3 0.6 really appreciate the help 回答1: You can use groupby().transform('sum')

Converting series from pandas to pyspark: need to use “groupby” and “size”, but pyspark yields error

阅读更多关于 Converting series from pandas to pyspark: need to use “groupby” and “size”, but pyspark yields error

问题 I am converting some code from Pandas to pyspark. In pandas, lets imagine I have the following mock dataframe, df: And in pandas, I define a certain variable the following way: value = df.groupby(["Age", "Siblings"]).size() And the output is a series as follows: However, when trying to covert this to pyspark, an error comes up: AttributeError: 'GroupedData' object has no attribute 'size' . Can anyone help me solve this? 回答1: The equivalent of size in pyspark is count: df.groupby(["Age",

Window of full weeks in pandas

阅读更多关于 Window of full weeks in pandas

问题 I am looking for a special window function in pandas: sort of a combination of rolling and expanding. For calculating (for instance) the mean and standard deviating, I want to regard all past data, but ignore the first few records to make sure I have a multiple of 7 (days in my case). That's because I know the data has a strong weekly pattern. Example: s = pd.Series([1, 3, 4, 5, 4, 3, 1, 2, 4, 5, 4, 5, 4, 2, 1, 3, 4, 5, 4, 3, 1, 3], pd.date_range('2020-01-01', '2020-01-22')) s.rolling(7, 7)

How to replace certain values in pandas Series with its previous value?

阅读更多关于 How to replace certain values in pandas Series with its previous value?

问题 I have a pandas Series object s like this: >>> s date 2020-03-26 19.72 2020-03-27 19.75 2020-03-30 19.43 2020-03-31 19.69 2020-04-01 -- 2020-04-06 20.03 2020-04-07 20.45 2020-04-08 21.00 2020-04-09 -- 2020-04-10 20.96 2020-04-13 20.75 2020-04-14 21.23 Name: price, dtype: object >>> s.values array(['19.72', '19.75', '19.43', '19.69', '--', '20.03', '20.45', '21.00', '20.82', '20.96', '20.75', '21.23'], dtype=object) How can I replace -- with its previous value? I mean I want s to be converted

pandas update specific rows in specific columns in one dataframe based on another dataframe

阅读更多关于 pandas update specific rows in specific columns in one dataframe based on another dataframe

问题 I have two dataframes, Big and Small, and I want to update Big based on the data in Small, only in specific columns. this is Big: >>> ID name country city hobby age 0 12 Meli Peru Lima eating 212 1 15 Saya USA new-york drinking 34 2 34 Aitel Jordan Amman riding 51 3 23 Tanya Russia Moscow sports 75 4 44 Gil Spain Madrid paella 743 and this is small: >>>ID name country city hobby age 0 12 Melinda Peru Lima eating 24 4 44 Gil Spain Barcelona friends 21 I would like to update the rows in Big

pandas update specific rows in specific columns in one dataframe based on another dataframe

阅读更多关于 pandas update specific rows in specific columns in one dataframe based on another dataframe