data-analysis | 易学教程

Pandas, Excel-Import and MultiIndex

阅读更多关于 Pandas, Excel-Import and MultiIndex

问题 I am new to pandas and currently trying to make some analysis of Excel-data in the following Schema: My goal is a visualisation with the index-labels XYZ , CDE , EFG , HU on the x-axis and the corresponing Perc -values of Yes , ProbYes , X , ProbNo , No stacked on the y-axis. Currently I'm parsing the Excel-data into a panda DataFrame via the code: import pandas as pd path = 'x1.xlsx' x = pd.ExcelFile(path) sheets = x.sheet_names table = x.parse(sheets[0], header=2) # take line 2 as column

Filter Pandas Data frame

阅读更多关于 Filter Pandas Data frame

问题 I have this pandas dataframe: open high low close volume TimeStamp 2016-06-23 10:00:00 586.76 594.00 585.54 589.94 478.176973 2016-06-23 11:00:00 589.94 595.49 588.23 592.63 448.689485 2016-06-23 12:00:00 592.63 592.63 1.50 581.13 527.816891 2016-06-23 13:00:00 581.13 586.33 578.58 580.96 728.424757 As you can see one of the values is not ok. So I want to filter it and change it to the mean of the last 5 values With this df['avg']=df['low'].rolling(5).mean().shift() I get this open high low

ValueError: Can only compare identically-labeled Series objects

阅读更多关于 ValueError: Can only compare identically-labeled Series objects

问题 here is my code, no matters what I do I keep on getting the error and followed all the index related solutions, can anyone help me? site = pd.read_csv('../data/survey_site.csv') sampled = site.sample(n=1) site = site.reset_index(drop=True) sampled = sampled.reset_index(drop=True) mask = site.mask(site['name'] == sampled['name']) 回答1: The issue is the comparison between site['name'] and sample['name'] is between two pd.Series . You can bypass that by making one of them a scalar. However, I

matching keys from two different dataframes

阅读更多关于 matching keys from two different dataframes

问题 I have two dataframes, df1, Name Stage Description key 0 Sri 1 Sri is one of the good singer in this two one 1 NaN 2 Thanks for reading two has 2 Ram 1 Ram is two of the good cricket player three 3 ganesh 1 one driver four 4 NaN 2 good buddies NaN df2, values member of four one of three friends sri is a cricketer Rahul has two brothers I want to replace the df1["key"] with df2 values, if the key is present in df2.values. I tried, df1["key"]=df2[df2["values"].str.contains("|".join(df2["values"

Python Pandas - How to format and split a text in column ?

阅读更多关于 Python Pandas - How to format and split a text in column ?

问题 I have a set of strings in a dataframe like below ID TextColumn 1 This is line number one 2 I love pandas, they are so puffy 3 [This $tring is with specia| characters, yes it is!] A. I want to format this string to eliminate all the special characters B. Once formatted, I'd like to get a list of unique words (space being the only split) Here is the code I have written: get_df_by_id dataframe has one selected frame, say ID 3. #replace all special characters formatted_title = get_df_by_id[

How to fill a particular value with mean value of the column between first row and the corresponding row in pandas dataframe

阅读更多关于 How to fill a particular value with mean value of the column between first row and the corresponding row in pandas dataframe

问题 I have a df like this, A B C D E 1 2 3 0 2 2 0 7 1 1 3 4 0 3 0 0 0 3 4 3 I am trying to replace all the 0 with mean() value between the first row and the 0 value row for the corresponding column, My expected output is, A B C D E 1.0 2.00 3.000000 0.0 2.0 2.0 1.00 7.000000 1.0 1.0 3.0 4.00 3.333333 3.0 1.0 1.5 1.75 3.000000 4.0 3.0 回答1: Here is main problem need previous mean value if multiple 0 per column, so realy problematic create vectorized solution: def f(x): for i, v in enumerate(x): if

Best fit curve for trend line

阅读更多关于 Best fit curve for trend line

问题 Problem Constraints Size of the data set, but not the data itself, is known. Data set grows by one data point at a time. Trend line is graphed one data point at a time (using a spline/Bezier curve). Graphs The collage below shows data sets with reasonably accurate trend lines: The graphs are: Upper-left. By hour, with ~24 data points. Upper-right. By day for one year, with ~365 data points. Lower-left. By week for one year, with ~52 data points. Lower-right. By month for one year, with ~12

Is a column in pandas.DF() monotonically increasing?

阅读更多关于 Is a column in pandas.DF() monotonically increasing?

问题 I can check if the index of a pandas.DataFrame() is monotonically increasing by using is_monotonic method. However, I would like to check if one of the column value is strictly increasing in value(float/integer) ? In [13]: my_df = pd.DataFrame([1,2,3,5,7,6,9]) In [14]: my_df Out[14]: 0 0 1 1 2 2 3 3 5 4 7 5 6 6 9 In [15]: my_df.index.is_monotonic Out[15]: True 回答1: Pandas 0.19 added a public Series.is_monotonic API (previously, this was available only in the undocumented algos module).

Is a column in pandas.DF() monotonically increasing?

阅读更多关于 Is a column in pandas.DF() monotonically increasing?

Training images and test images

阅读更多关于 Training images and test images

问题 I am working on a project about the feedforward pathway of the ventral stream, and i have 6 images to be recognized at the InferoTemporal Layer. Please can someone give me images' exmamples showing to me what is the difference between training images and test images. So what i should add to my folder that contain my training images? Does i should add another folder that contain a list of test images ? if yes, what should be these test images? Does the training images must contains the images