data-analysis

How to convert rows values in dataframe to columns labels in Python after groupby?

假如想象 提交于 2019-12-04 19:15:08
I have specific case where I want to convert this df: print df Schoolname Attribute Value 0 xyz School Safe 3.44 1 xyz School Cleanliness 2.34 2 xyz School Money 4.65 3 abc School Safe 4.40 4 abc School Cleanliness 4.50 5 abc School Money 4.90 6 lmn School Safe 2.34 7 lmn School Cleanliness 3.89 8 lmn School Money 4.65 Which i need to get in this format so that i can convert it to numpy array for linear regression modelling. required_df: Schoolname Safe Cleanliness Money 0 xyz School 3.44 2.34 4.65 1 abc School 4.40 4.50 4.90 2 lmn School 2.34 3.89 4.65 I know we need to do groupby('Schoolname

Pandas and Python Dataframes and Conditional Shift Function

邮差的信 提交于 2019-12-04 17:49:21
Is there a conditional "shift" parameter in data frames? For example, Assume I own a used car lot and I have data as follows SaleDate Car 12/1/2016 Wrangler 12/2/2016 Camry 12/3/2016 Wrangler 12/7/2016 Prius 12/10/2016 Prius 12/12/2016 Wrangler I want to find two things out from this list - 1) For each sale, when was the last day that a car was sold? This is simple in Pandas, just a simple shift as follows df['PriorSaleDate'] = df['SaleDate'].shift() 2) For each sale, when was the prior date that the same type of car was sold? So, for example, the Wrangler sale on 12/3 would point two rows

Python Pandas join dataframes on index

风流意气都作罢 提交于 2019-12-04 16:42:03
问题 I am trying to join to dataframe on the same column "Date", the code is as follow: import pandas as pd from datetime import datetime df_train_csv = pd.read_csv('./train.csv',parse_dates=['Date'],index_col='Date') start = datetime(2010, 2, 5) end = datetime(2012, 10, 26) df_train_fly = pd.date_range(start, end, freq="W-FRI") df_train_fly = pd.DataFrame(pd.Series(df_train_fly), columns=['Date']) merged = df_train_csv.join(df_train_fly.set_index(['Date']), on = ['Date'], how = 'right', lsuffix='

date range for six monthly in pandas

﹥>﹥吖頭↗ 提交于 2019-12-04 15:21:14
So, this is my data frame. PatientNumber QT Answer Answerdate DiagnosisDate 1 1 transferring No 2017-03-03 2018-05-03 2 1 preparing food No 2017-03-03 2018-05-03 3 1 medications Yes 2017-03-03 2018-05-03 4 2 transferring No 2011-05-10 2012-05-04 5 2 preparing food No 2011-05-10 2012-05-04 6 2 medications No 2011-05-10 2012-05-04 7 2 transferring Yes 2011-15-03 2012-05-04 8 2 preparing food Yes 2011-15-03 2012-05-04 9 2 medications No 2011-15-03 2012-05-04 10 2 transferring Yes 2010-15-12 2012-05-04 11 2 preparing food No 2010-15-12 2012-05-04 12 2 medications No 2010-15-12 2012-05-04 13 2

Pandas compare each row with all rows in data frame and save results in list for each row

梦想与她 提交于 2019-12-04 13:35:01
问题 I try compare each row with all rows in pandas DF through fuzzywuzzy.fuzzy.partial_ratio() >= 85 and write results in list for each row. in: df = pd.DataFrame( {'id':[1, 2, 3, 4, 5, 6], 'name':['dog', 'cat', 'mad cat', 'good dog', 'bad dog', 'chicken']}) use pandas function with fuzzywuzzy library get result: out: id name match_id_list 1 dog [4, 5] 2 cat [3, ] 3 mad cat [2, ] 4 good dog [1, 5] 5 bad dog [1, 4] 6 chicken [] But I don't understand how get this. 回答1: The first step would be to

Customizing rolling_apply function in Python pandas

徘徊边缘 提交于 2019-12-04 12:53:56
Setup I have a DataFrame with three columns: "Category" contains True and False, and I have done df.groupby('Category') to group by these values. "Time" contains timestamps (measured in seconds) at which values have been recorded "Value" contains the values themselves. At each time instance, two values are recorded: one has category "True", and the other has category "False". Rolling apply question Within each category group , I want to compute a number and store it in column Result for each time . Result is the percentage of values between time t-60 and t that fall between 1 and 3. The

How to capture raw signal from wireless router?

Deadly 提交于 2019-12-04 11:35:15
问题 I have seen several projects now which derive novel spatial information from radio data collected from a typical wireless router: http://wisee.cs.washington.edu/ http://www.extremetech.com/extreme/133936-using-wifi-to-see-through-walls The idea of using a wireless router as a sort of passive radar is fantastic. I am very interested in experimenting with data collected from a wireless router myself, but there is little information on how to go about actually interfacing with a wireless router

Huge sparse dataframe to scipy sparse matrix without dense transform

这一生的挚爱 提交于 2019-12-04 09:40:06
Have data with more then 1 million rows and 30 columns, one of the columns is user_id (more then 1500 different users). I want one-hot-encode this column and to use data in ML algorithms (xgboost, FFM, scikit). But due to huge row numbers and unique user values matrix will be ~ 1 million X 1500, so need do this in sparse format (otherwise data kill all RAM). For me convenient way to work with data through pandas DataFrame, which also now it support sparse format: df = pd.get_dummies(df, columns=['user_id', 'type'], sparse=True) Work pretty fast and have small size in RAM. But for working with

`error: unbalanced parenthesis` while checking if an item presents in a pandas dataframe

ⅰ亾dé卋堺 提交于 2019-12-04 07:12:50
df=pd.DataFrame({"A":["one","two","three"],"B":["fopur","give","six"]}) when I do, df.B.str.contains("six").any() out[2]=True when I do, df.B.str.contains("six)").any() I am getting the below error, C:\ProgramData\Anaconda3\lib\sre_parse.py in parse(str, flags, pattern) 868 if source.next is not None: 869 assert source.next == ")" --> 870 raise source.error("unbalanced parenthesis") 871 872 if flags & SRE_FLAG_DEBUG: error: unbalanced parenthesis at position 3 Please help! You can set regex=False in in pandas.Series.str.contains : df.B.str.contains("six)", regex=False).any() If you want to

Plotting multiple segments with colors based on some variable with matplotlib

天大地大妈咪最大 提交于 2019-12-04 06:02:12
问题 Following the answers of both topics Matplotlib: Plotting numerous disconnected line segments with different colors and matplotlib: how to change data points color based on some variable, I am trying to plot a set of segments given by a list, for instance: data = [(-118, -118), (34.07, 34.16), (-117.99, -118.15), (34.07, 34.16), (-118, -117.98), (34.16, 34.07)] and I would like to plot each segments with a color based on a second list for instance: color_param = [9, 2, 21] with a colormap. So