data-analysis

JMeter soap response - data analysis

夙愿已清 提交于 2019-12-12 01:04:36
问题 I am doing some data analysis on address data. The data for the analysis is to be generated calling soap web service which returns soap response. In each soap response I am interested only in specific field i.e. 'matchType' in the example shown below. 'matchType' can have multiple occurrences maximum upto 20. I have 500 addresses for which I get 500 responses similar to the one shown below. I am using JMeter to fire 500 soap requests to the web service. Problem How I can create the final

Stata: combining regression results with other results

自作多情 提交于 2019-12-11 21:10:58
问题 I am trying to replicate some results from a study. therefore often i need to compare my regression results with results from the study that i'm trying to replicate. I have been manually combining my esttab results with the study results in excel. this however is tedious since i'm working with lot of variables. I was wondering whether there is a way to store the study results and then calling them to go next to my regression results. I tried storing them as scalars and calling them using

Ambiguous truth value with boolean logic

ⅰ亾dé卋堺 提交于 2019-12-11 20:17:55
问题 I am trying to use some boolean logic in a function on a dataframe, but get an error: In [4]: data={'level':[20,19,20,21,25,29,30,31,30,29,31]} frame=DataFrame(data) frame Out[4]: level 0 20 1 19 2 20 3 21 4 25 5 29 6 30 7 31 8 30 9 29 10 31 In [35]: def calculate(x): baseline=max(frame['level'],frame['level'].shift(1))#doesnt work #baseline=x['level']+4#works difftobase=x['level']-baseline return baseline, difftobase frame['baseline'], frame['difftobase'] = zip(*frame.apply(calculate, axis=1

How to update some of the rows from another series in pandas using df.update

ぐ巨炮叔叔 提交于 2019-12-11 19:26:22
问题 I have a df like, stamp value 0 00:00:00 2 1 00:00:00 3 2 01:00:00 5 converting to time delta df['stamp']=pd.to_timedelta(df['stamp']) slicing only odd index and adding 30 mins, odd_df=pd.to_timedelta(df[1::2]['stamp'])+pd.to_timedelta('30 min') #print(odd_df) 1 00:30:00 Name: stamp, dtype: timedelta64[ns] now, updating df with odd_df, as per the documentation it should give my expected output. expected output: df.update(odd_df) #print(df) stamp value 0 00:00:00 2 1 00:30:00 3 2 01:00:00 5

How to pivot_table with with duplicated index

我的未来我决定 提交于 2019-12-11 17:13:41
问题 I have a df_ like this, name level status yes high open no high closed no med closed yes low open no med rejected no high open I am trying to create a pivot table with index='level',columns='status', values=sum of occurances with respect to the column and index my code: df_['temp']=df_['level'].astype(bool).astype(int) df_.pivot(index='level',columns='status',values='temp') but gives me, ValueError: Index contains duplicate entries, cannot reshape My expected output is, open closed rejected

how to re order a pandas dataframe based on a dictionary condition

可紊 提交于 2019-12-11 17:03:16
问题 I have a df like this, case step deep value 0 case 1 1 ram in India ram,cricket 1 NaN 2 ram plays cricket NaN 2 case 2 1 ravi played football ravi 3 NaN 2 ravi works welll NaN 4 case 3 1 Sri bought a car sri 5 NaN 2 sri went out NaN and a dictionary, my_dict = {ram:1,cricket:1,ravi:2.5,sri:1} I am trying to re-order the dataframe according to the values of the dictionary, I achieved this dictionary using tfidf method. I face difficulty in re-ordering as we need to re-order the rows including

SQL:Display distinct ids for all the set of values from table

随声附和 提交于 2019-12-11 16:23:49
问题 I have a problem where after executing a query i'm getting a result like this DevID Difference ----------------- 99 5 99 10 99 5 99 4 12 8 12 9 12 5 12 6 i dont want the duplicate ids, I should be able to display only one id. This could be easily achieved by using distinct however the problem is i also need to display the Difference column. I'm not bothered which value comes in diff but either one of the values for 99 can come there but basically i just need one value for id. Expected result

How to find the peak coordinate from dataset

*爱你&永不变心* 提交于 2019-12-11 11:06:13
问题 I have a group of dataset. This is the graph I draw using this dataset. How to find the coordinate of peak value from this dataset? Anyone got good java algorithm regarding this issue? 回答1: For this dataset specifically, I would do the following: Make the data stationary by taking first differences Signal when the data is above some threshold level. You can use a fixed threshold or an adaptive threshold (as in this answer for example) When I use the dataset from this question, for

How to reduce part of a dataframe colunm value based on another column

给你一囗甜甜゛ 提交于 2019-12-11 09:08:37
问题 I have a dataframe like this. I am trying to remove the string which presents in substring column. Main substring Sri playnig well cricket cricket sri went out NaN Ram is in NaN Ram went to UK,US UK,US My expected outupt is, Main substring Sri playnig well cricket sri went out NaN Ram is in NaN Ram went to UK,US I tried df["Main"].str.reduce(df["substring"]) but not working, pls help. 回答1: This is one way using pd.DataFrame.apply . Note that np.nan == np.nan evaluates to False , we can use

Need help to solve the Unnamed and to change it in dataframe in pandas

喜你入骨 提交于 2019-12-11 08:54:39
问题 how set my indexes from "Unnamed" to the first line of my dataframe in python import pandas as pd df = pd.read_excel('example.xls','Day_Report',index_col=None ,skip_footer=31 ,index=False) df = df.dropna(how='all',axis=1) df = df.dropna(how='all') df = df.drop(2) 回答1: To set the column names (assuming that's what you mean by "indexes") to the first row, you can use df.columns = df.loc[0, :].values Following that, if you want to drop the first row, you can use df.drop(0, inplace=True) Edit As