dataframe | 易学教程

Latex symbols in pandas dataframe

阅读更多关于 Latex symbols in pandas dataframe

问题 I have a dataframe that I'm gonna use some special latex symbol for its columns and indeces. Something like this : pd.DataFrame({"a":[1,2,3],"b":[4,5,6]}) With a small difference that instead of a and b as the columns I want to have $\diamond$ and $\dagger$ symbols respectively. Any kind of help would be appreciated. 来源： https://stackoverflow.com/questions/53377554/latex-symbols-in-pandas-dataframe

Filter rows based one column' value and calculate percentage of sum in Pandas

阅读更多关于 Filter rows based one column' value and calculate percentage of sum in Pandas

问题 Given a small dataset as follows: value input 0 3 0 1 4 1 2 3 -1 3 2 1 4 3 -1 5 5 0 6 1 0 7 1 1 8 1 1 I have used the following code: df['pct'] = df['value'] / df['value'].sum() But I want to calculate pct by excluding input = -1 , which means if input value is -1 , then the correspondent values will not taken into account to sum up, neither necessary to calculate pct , for rows 2 and 4 at this case. The expected result will like this: value input pct 0 3 0 0.18 1 4 1 0.24 2 3 -1 NaN 3 2 1 0

How do I troubleshoot ValueError: array is of length %s, while the length of the DataFrame is %s?

阅读更多关于 How do I troubleshoot ValueError: array is of length %s, while the length of the DataFrame is %s?

问题 I'm trying to follow the example on this notebook. As suggested in this github thread: I've upped the ulimit to 9999. I've already converted the csv files to hdf5 My code fails when trying to open a single hdf5 file into a dataframe: df = vaex.open('data/chat_history_00.hdf5') Here's the rest of the code: import re import glob import vaex import numpy as np def tryint(s): try: return int(s) except: return s def alphanum_key(s): """ Turn a string into a list of string and number chunks. "z23a"

Appending data to a dataframe but changing rows after certain # of columns

阅读更多关于 Appending data to a dataframe but changing rows after certain # of columns

问题 Here is a code that I've written, which creates some increments of 3 variables to be used within p-value calculations, where the three variables are loc values or indicators or whatever the numbers mean: i = 0 k = 2 j = 2 result = [] df = pd.DataFrame() while j < data.shape[1]: tstat, data_stat = ttest_ind_from_stats(data.loc[i][k], data.loc[i + 1][k], data.loc[i + 2][k], data.loc[i][j], data.loc[i + 1][j], data.loc[i + 2][j]) result.append([data_stat]) j+=1 if j == 8: j = 2 i = i + 3 if i ==

group by a dataframe by values that are just less than a second off - pandas

阅读更多关于 group by a dataframe by values that are just less than a second off - pandas

问题 Let's say i have a pandas dataframe as below: >>> df=pd.DataFrame({'dt':pd.to_datetime(['2018-12-10 16:35:34.246','2018-12-10 16:36:34.243','2018-12-10 16:38:34.216','2018-12-10 16:42:34.123']),'value':[1,2,3,4]}) >>> df dt value 0 2018-12-10 16:35:34.246 1 1 2018-12-10 16:36:34.243 2 2 2018-12-10 16:38:34.216 3 3 2018-12-10 16:42:34.123 4 >>> I would like to group this dataframe by 'dt' column, but i want to group it in a way that it thinks the values that are less than a second different

Compute dataframe columns from a string formula in variables?

阅读更多关于 Compute dataframe columns from a string formula in variables?

问题 I use an excel file in which I determine the names of sensor, and a formula allowing me to create a new "synthetic" sensor based on real sensors. I would like to write the formula as string like for example "y1 + y2 + y3" and not "df ['y1'] + df ['y2'] + df ['y3]" but I don't see which method to use? Excel file example: My script must therefore create a new sensor for each line of this excel file. This new sensor will then be uploaded to my database. The number of sensors to calculate the new

Merging two dataframes, removing duplicates and aggregation in R

阅读更多关于 Merging two dataframes, removing duplicates and aggregation in R

问题 I have two dataframes in R named house and candidates. house House Region Military_Strength 1 Stark The North 20000 2 Targaryen Slaver's Bay 110000 3 Lannister The Westerlands 60000 4 Baratheon The Stormlands 40000 5 Tyrell The Reach 30000 candidates House Name Region 1 Lannister Jamie Lannister Westros 2 Stark Robb Stark North 3 Stark Arya Stark Westros 4 Lannister Cersi Lannister Westros 5 Targaryen Daenerys Targaryen Mereene 6 Baratheon Robert Baratheon Westros 7 Mormont Jorah Mormont

Pandas qcut based on expanding window of all columns

阅读更多关于 Pandas qcut based on expanding window of all columns

问题 Let's say I have a dataframe: import numpy as np import pandas as pd df = pd.DataFrame(np.random.normal(0,1,[100,50])) that looks like: 0 1 2 3 4 5 6 \ 0 -0.141305 2.158252 1.006520 -1.004185 -0.213160 0.648904 -0.089369 1 -1.373167 -1.100959 1.007023 0.699591 -1.667834 1.422182 0.940912 2 -0.212014 1.967436 0.401133 -0.996298 -1.696490 -0.857453 -0.686584 3 -0.351902 0.413816 -0.494869 0.448740 0.146897 -0.798095 -0.546489 4 0.416376 -0.689577 -0.967050 -1.667480 1.223966 -1.382113 -0.812368

Pandas qcut based on expanding window of all columns

阅读更多关于 Pandas qcut based on expanding window of all columns

Pandas qcut based on expanding window of all columns

阅读更多关于 Pandas qcut based on expanding window of all columns