pandas

Pandas groupby in combination with sklean preprocessing continued

走远了吗. 提交于 2021-02-07 20:24:11
问题 Continue from this post: Pandas groupby in combination with sklearn preprocessing I need to do preprocessing by scaling grouped data by two columns, somehow get some error for the second method import pandas as pd import numpy as np from sklearn.preprocessing import robust_scale,minmax_scale df = pd.DataFrame( dict( id=list('AAAAABBBBB'), loc = (10,20,10,20,10,20,10,20,10,20), value=(0,10,10,20,100,100,200,30,40,100))) df['new'] = df.groupby(['id','loc']).value.transform(lambda x:minmax_scale

JSON to CSV output using pandas

耗尽温柔 提交于 2021-02-07 20:24:10
问题 I am trying to convert the below .json file to .csv using pandas. input json file name : my_json_file.json { "profile_set":[ { "doc_type":"PROFILE", "key":"123", "mem_list":{ "mem_num":"10001", "current_flag":"Y", "mem_flag":[ ], "child_mem_list":{ "child_mem_num":[ ] } }, "first_name":"Robert", "middle_name":[ ], "last_name":"John", "created_datetime":"2018-01-06T12:52:09" }, { "doc_type":"PROFILE", "key":"456", "mem_list":{ "mem_num":"10002", "current_flag":"Y", "mem_flag":"Y", "child_mem

How to show label names in pandas groupby histogram plot

社会主义新天地 提交于 2021-02-07 20:23:31
问题 I can plot multiple histograms in a single plot using pandas but there are few things missing: How to give the label. I can only plot one figure, how to change it to layout=(3,1) or something else. Also, in figure 1, all the bins are filled with solid colors, and its kind of difficult to know which is which, how to fill then with different markers (eg. crosses,slashes,etc)? Here is the MWE: import numpy as np import pandas as pd import seaborn as sns import matplotlib.pyplot as plt df = sns

Python Pandas merge multiple columns into a dictionary column

女生的网名这么多〃 提交于 2021-02-07 20:17:51
问题 I have a dataframe (df_full) like so: |cust_id|address |store_id|email |sales_channel|category| ------------------------------------------------------------------- |1234567|123 Main St|10SjtT |idk@gmail.com|ecom |direct | |4567345|345 Main St|10SjtT |101@gmail.com|instore |direct | |1569457|876 Main St|51FstT |404@gmail.com|ecom |direct | and I would like to combine the last 4 fields into one metadata field that is a dictionary like so: |cust_id|address |metadata | ---------------------------

Python Pandas merge multiple columns into a dictionary column

落花浮王杯 提交于 2021-02-07 20:17:19
问题 I have a dataframe (df_full) like so: |cust_id|address |store_id|email |sales_channel|category| ------------------------------------------------------------------- |1234567|123 Main St|10SjtT |idk@gmail.com|ecom |direct | |4567345|345 Main St|10SjtT |101@gmail.com|instore |direct | |1569457|876 Main St|51FstT |404@gmail.com|ecom |direct | and I would like to combine the last 4 fields into one metadata field that is a dictionary like so: |cust_id|address |metadata | ---------------------------

Add custom legend to bokeh Bar

别等时光非礼了梦想. 提交于 2021-02-07 19:59:37
问题 I have pandas series as: >>> etypes 0 6271 1 6379 2 399 3 110 4 4184 5 1987 And I want to draw Bar chart in Bokeh: p = Bar(etypes) . However for legend I get just etypes index number, which I tried to decrypt with this dictionary: legend = { 0: 'type_1', 1: 'type_2', 2: 'type_3', 3: 'type_4', 4: 'type_5', 5: 'type_6', } by passing it to label argument: p = Bar(etypes, label=legend) , but it didn't work. Also passing the list(legend.values()) does not work. Any ideas how to add custom legend

Add custom legend to bokeh Bar

送分小仙女□ 提交于 2021-02-07 19:56:35
问题 I have pandas series as: >>> etypes 0 6271 1 6379 2 399 3 110 4 4184 5 1987 And I want to draw Bar chart in Bokeh: p = Bar(etypes) . However for legend I get just etypes index number, which I tried to decrypt with this dictionary: legend = { 0: 'type_1', 1: 'type_2', 2: 'type_3', 3: 'type_4', 4: 'type_5', 5: 'type_6', } by passing it to label argument: p = Bar(etypes, label=legend) , but it didn't work. Also passing the list(legend.values()) does not work. Any ideas how to add custom legend

Specify converter for Pandas index column in read_csv

本小妞迷上赌 提交于 2021-02-07 19:51:55
问题 I am attempting to read in a CSV file with hexadecimal numbers in the index column: InputBits, V0, V1, V2, V3 7A, 0.000594457716, 0.000620631282, 0.000569834178, 0.000625374384, 7B, 0.000601155649, 0.000624282078, 0.000575955914, 0.000632111367, 7C, 0.000606026872, 0.000629149805, 0.000582689823, 0.000634561234, 7D, 0.000612115902, 0.000634625998, 0.000584526357, 0.000638235952, 7E, 0.000615769413, 0.000637668328, 0.000590648093, 0.00064987256, 7F, 0.000620640637, 0.000643144494, 0

How to sort data frame by column values?

僤鯓⒐⒋嵵緔 提交于 2021-02-07 19:50:48
问题 I am relatively new to python and pandas data frames so maybe I have missed something very easy here. So I was having data frame with many rows and columns but at the end finally manage to get only one row with maximum value from each column. I used this code to do that: import pandas as pd d = {'A' : [1.2, 2, 4, 6], 'B' : [2, 8, 10, 12], 'C' : [5, 3, 4, 5], 'D' : [3.5, 9, 1, 11], 'E' : [5, 8, 7.5, 3], 'F' : [8.8, 4, 3, 2]} df = pd.DataFrame(d, index=['a', 'b', 'c', 'd']) print df Out: A B C

Python : Different behaviour of DatetimeIndex while plotting line and bar plots using DataFrame

狂风中的少年 提交于 2021-02-07 19:50:38
问题 I have a DataFrame with the row index as a DatetimeIndex. This index is coming up differently on the x-axis while I am making line and bar plots. My code is as follows: start_date = datetime.datetime.strptime('2017-02-20', '%Y-%m-%d').date() end_date = datetime.datetime.strptime('2017-02-23', '%Y-%m-%d').date() daterange = pd.date_range(start_date, end_date) df = pd.DataFrame(index = daterange, data = {'Male':[12, 23, 13, 11], 'Female': [10, 25, 15, 9]}) df.plot(kind='line') df.plot(kind='bar