dataframe

Latex symbols in pandas dataframe

妖精的绣舞 提交于 2021-01-28 19:06:47
问题 I have a dataframe that I'm gonna use some special latex symbol for its columns and indeces. Something like this : pd.DataFrame({"a":[1,2,3],"b":[4,5,6]}) With a small difference that instead of a and b as the columns I want to have $\diamond$ and $\dagger$ symbols respectively. Any kind of help would be appreciated. 来源: https://stackoverflow.com/questions/53377554/latex-symbols-in-pandas-dataframe

Filter rows based one column' value and calculate percentage of sum in Pandas

泪湿孤枕 提交于 2021-01-28 18:56:06
问题 Given a small dataset as follows: value input 0 3 0 1 4 1 2 3 -1 3 2 1 4 3 -1 5 5 0 6 1 0 7 1 1 8 1 1 I have used the following code: df['pct'] = df['value'] / df['value'].sum() But I want to calculate pct by excluding input = -1 , which means if input value is -1 , then the correspondent values will not taken into account to sum up, neither necessary to calculate pct , for rows 2 and 4 at this case. The expected result will like this: value input pct 0 3 0 0.18 1 4 1 0.24 2 3 -1 NaN 3 2 1 0

How do I troubleshoot ValueError: array is of length %s, while the length of the DataFrame is %s?

半城伤御伤魂 提交于 2021-01-28 18:51:15
问题 I'm trying to follow the example on this notebook. As suggested in this github thread: I've upped the ulimit to 9999. I've already converted the csv files to hdf5 My code fails when trying to open a single hdf5 file into a dataframe: df = vaex.open('data/chat_history_00.hdf5') Here's the rest of the code: import re import glob import vaex import numpy as np def tryint(s): try: return int(s) except: return s def alphanum_key(s): """ Turn a string into a list of string and number chunks. "z23a"

Appending data to a dataframe but changing rows after certain # of columns

天大地大妈咪最大 提交于 2021-01-28 18:49:02
问题 Here is a code that I've written, which creates some increments of 3 variables to be used within p-value calculations, where the three variables are loc values or indicators or whatever the numbers mean: i = 0 k = 2 j = 2 result = [] df = pd.DataFrame() while j < data.shape[1]: tstat, data_stat = ttest_ind_from_stats(data.loc[i][k], data.loc[i + 1][k], data.loc[i + 2][k], data.loc[i][j], data.loc[i + 1][j], data.loc[i + 2][j]) result.append([data_stat]) j+=1 if j == 8: j = 2 i = i + 3 if i ==

group by a dataframe by values that are just less than a second off - pandas

£可爱£侵袭症+ 提交于 2021-01-28 18:30:47
问题 Let's say i have a pandas dataframe as below: >>> df=pd.DataFrame({'dt':pd.to_datetime(['2018-12-10 16:35:34.246','2018-12-10 16:36:34.243','2018-12-10 16:38:34.216','2018-12-10 16:42:34.123']),'value':[1,2,3,4]}) >>> df dt value 0 2018-12-10 16:35:34.246 1 1 2018-12-10 16:36:34.243 2 2 2018-12-10 16:38:34.216 3 3 2018-12-10 16:42:34.123 4 >>> I would like to group this dataframe by 'dt' column, but i want to group it in a way that it thinks the values that are less than a second different

Compute dataframe columns from a string formula in variables?

纵饮孤独 提交于 2021-01-28 18:25:36
问题 I use an excel file in which I determine the names of sensor, and a formula allowing me to create a new "synthetic" sensor based on real sensors. I would like to write the formula as string like for example "y1 + y2 + y3" and not "df ['y1'] + df ['y2'] + df ['y3]" but I don't see which method to use? Excel file example: My script must therefore create a new sensor for each line of this excel file. This new sensor will then be uploaded to my database. The number of sensors to calculate the new

Merging two dataframes, removing duplicates and aggregation in R

Deadly 提交于 2021-01-28 17:43:43
问题 I have two dataframes in R named house and candidates. house House Region Military_Strength 1 Stark The North 20000 2 Targaryen Slaver's Bay 110000 3 Lannister The Westerlands 60000 4 Baratheon The Stormlands 40000 5 Tyrell The Reach 30000 candidates House Name Region 1 Lannister Jamie Lannister Westros 2 Stark Robb Stark North 3 Stark Arya Stark Westros 4 Lannister Cersi Lannister Westros 5 Targaryen Daenerys Targaryen Mereene 6 Baratheon Robert Baratheon Westros 7 Mormont Jorah Mormont

Pandas qcut based on expanding window of all columns

不问归期 提交于 2021-01-28 16:47:51
问题 Let's say I have a dataframe: import numpy as np import pandas as pd df = pd.DataFrame(np.random.normal(0,1,[100,50])) that looks like: 0 1 2 3 4 5 6 \ 0 -0.141305 2.158252 1.006520 -1.004185 -0.213160 0.648904 -0.089369 1 -1.373167 -1.100959 1.007023 0.699591 -1.667834 1.422182 0.940912 2 -0.212014 1.967436 0.401133 -0.996298 -1.696490 -0.857453 -0.686584 3 -0.351902 0.413816 -0.494869 0.448740 0.146897 -0.798095 -0.546489 4 0.416376 -0.689577 -0.967050 -1.667480 1.223966 -1.382113 -0.812368

Pandas qcut based on expanding window of all columns

◇◆丶佛笑我妖孽 提交于 2021-01-28 16:47:19
问题 Let's say I have a dataframe: import numpy as np import pandas as pd df = pd.DataFrame(np.random.normal(0,1,[100,50])) that looks like: 0 1 2 3 4 5 6 \ 0 -0.141305 2.158252 1.006520 -1.004185 -0.213160 0.648904 -0.089369 1 -1.373167 -1.100959 1.007023 0.699591 -1.667834 1.422182 0.940912 2 -0.212014 1.967436 0.401133 -0.996298 -1.696490 -0.857453 -0.686584 3 -0.351902 0.413816 -0.494869 0.448740 0.146897 -0.798095 -0.546489 4 0.416376 -0.689577 -0.967050 -1.667480 1.223966 -1.382113 -0.812368

Pandas qcut based on expanding window of all columns

故事扮演 提交于 2021-01-28 16:46:55
问题 Let's say I have a dataframe: import numpy as np import pandas as pd df = pd.DataFrame(np.random.normal(0,1,[100,50])) that looks like: 0 1 2 3 4 5 6 \ 0 -0.141305 2.158252 1.006520 -1.004185 -0.213160 0.648904 -0.089369 1 -1.373167 -1.100959 1.007023 0.699591 -1.667834 1.422182 0.940912 2 -0.212014 1.967436 0.401133 -0.996298 -1.696490 -0.857453 -0.686584 3 -0.351902 0.413816 -0.494869 0.448740 0.146897 -0.798095 -0.546489 4 0.416376 -0.689577 -0.967050 -1.667480 1.223966 -1.382113 -0.812368