data-analysis

Pandas, Excel-Import and MultiIndex

拟墨画扇 提交于 2019-12-24 02:13:49
问题 I am new to pandas and currently trying to make some analysis of Excel-data in the following Schema: My goal is a visualisation with the index-labels XYZ , CDE , EFG , HU on the x-axis and the corresponing Perc -values of Yes , ProbYes , X , ProbNo , No stacked on the y-axis. Currently I'm parsing the Excel-data into a panda DataFrame via the code: import pandas as pd path = 'x1.xlsx' x = pd.ExcelFile(path) sheets = x.sheet_names table = x.parse(sheets[0], header=2) # take line 2 as column

Filter Pandas Data frame

雨燕双飞 提交于 2019-12-24 01:54:13
问题 I have this pandas dataframe: open high low close volume TimeStamp 2016-06-23 10:00:00 586.76 594.00 585.54 589.94 478.176973 2016-06-23 11:00:00 589.94 595.49 588.23 592.63 448.689485 2016-06-23 12:00:00 592.63 592.63 1.50 581.13 527.816891 2016-06-23 13:00:00 581.13 586.33 578.58 580.96 728.424757 As you can see one of the values is not ok. So I want to filter it and change it to the mean of the last 5 values With this df['avg']=df['low'].rolling(5).mean().shift() I get this open high low

ValueError: Can only compare identically-labeled Series objects

旧城冷巷雨未停 提交于 2019-12-24 01:23:56
问题 here is my code, no matters what I do I keep on getting the error and followed all the index related solutions, can anyone help me? site = pd.read_csv('../data/survey_site.csv') sampled = site.sample(n=1) site = site.reset_index(drop=True) sampled = sampled.reset_index(drop=True) mask = site.mask(site['name'] == sampled['name']) 回答1: The issue is the comparison between site['name'] and sample['name'] is between two pd.Series . You can bypass that by making one of them a scalar. However, I

matching keys from two different dataframes

 ̄綄美尐妖づ 提交于 2019-12-24 00:43:30
问题 I have two dataframes, df1, Name Stage Description key 0 Sri 1 Sri is one of the good singer in this two one 1 NaN 2 Thanks for reading two has 2 Ram 1 Ram is two of the good cricket player three 3 ganesh 1 one driver four 4 NaN 2 good buddies NaN df2, values member of four one of three friends sri is a cricketer Rahul has two brothers I want to replace the df1["key"] with df2 values, if the key is present in df2.values. I tried, df1["key"]=df2[df2["values"].str.contains("|".join(df2["values"

Python Pandas - How to format and split a text in column ?

我是研究僧i 提交于 2019-12-24 00:13:11
问题 I have a set of strings in a dataframe like below ID TextColumn 1 This is line number one 2 I love pandas, they are so puffy 3 [This $tring is with specia| characters, yes it is!] A. I want to format this string to eliminate all the special characters B. Once formatted, I'd like to get a list of unique words (space being the only split) Here is the code I have written: get_df_by_id dataframe has one selected frame, say ID 3. #replace all special characters formatted_title = get_df_by_id[

How to fill a particular value with mean value of the column between first row and the corresponding row in pandas dataframe

不羁岁月 提交于 2019-12-23 20:04:23
问题 I have a df like this, A B C D E 1 2 3 0 2 2 0 7 1 1 3 4 0 3 0 0 0 3 4 3 I am trying to replace all the 0 with mean() value between the first row and the 0 value row for the corresponding column, My expected output is, A B C D E 1.0 2.00 3.000000 0.0 2.0 2.0 1.00 7.000000 1.0 1.0 3.0 4.00 3.333333 3.0 1.0 1.5 1.75 3.000000 4.0 3.0 回答1: Here is main problem need previous mean value if multiple 0 per column, so realy problematic create vectorized solution: def f(x): for i, v in enumerate(x): if

Best fit curve for trend line

旧街凉风 提交于 2019-12-23 10:14:43
问题 Problem Constraints Size of the data set, but not the data itself, is known. Data set grows by one data point at a time. Trend line is graphed one data point at a time (using a spline/Bezier curve). Graphs The collage below shows data sets with reasonably accurate trend lines: The graphs are: Upper-left. By hour, with ~24 data points. Upper-right. By day for one year, with ~365 data points. Lower-left. By week for one year, with ~52 data points. Lower-right. By month for one year, with ~12

Is a column in pandas.DF() monotonically increasing?

*爱你&永不变心* 提交于 2019-12-23 07:55:44
问题 I can check if the index of a pandas.DataFrame() is monotonically increasing by using is_monotonic method. However, I would like to check if one of the column value is strictly increasing in value(float/integer) ? In [13]: my_df = pd.DataFrame([1,2,3,5,7,6,9]) In [14]: my_df Out[14]: 0 0 1 1 2 2 3 3 5 4 7 5 6 6 9 In [15]: my_df.index.is_monotonic Out[15]: True 回答1: Pandas 0.19 added a public Series.is_monotonic API (previously, this was available only in the undocumented algos module).

Is a column in pandas.DF() monotonically increasing?

五迷三道 提交于 2019-12-23 07:55:24
问题 I can check if the index of a pandas.DataFrame() is monotonically increasing by using is_monotonic method. However, I would like to check if one of the column value is strictly increasing in value(float/integer) ? In [13]: my_df = pd.DataFrame([1,2,3,5,7,6,9]) In [14]: my_df Out[14]: 0 0 1 1 2 2 3 3 5 4 7 5 6 6 9 In [15]: my_df.index.is_monotonic Out[15]: True 回答1: Pandas 0.19 added a public Series.is_monotonic API (previously, this was available only in the undocumented algos module).

Training images and test images

假装没事ソ 提交于 2019-12-23 03:59:08
问题 I am working on a project about the feedforward pathway of the ventral stream, and i have 6 images to be recognized at the InferoTemporal Layer. Please can someone give me images' exmamples showing to me what is the difference between training images and test images. So what i should add to my folder that contain my training images? Does i should add another folder that contain a list of test images ? if yes, what should be these test images? Does the training images must contains the images