问题
Here is a code that I've written, which creates some increments of 3 variables to be used within p-value calculations, where the three variables are loc values or indicators or whatever the numbers mean:
i = 0
k = 2
j = 2
result = []
df = pd.DataFrame()
while j < data.shape[1]:
tstat, data_stat = ttest_ind_from_stats(data.loc[i][k], data.loc[i + 1][k], data.loc[i + 2][k], data.loc[i][j],
data.loc[i + 1][j], data.loc[i + 2][j])
result.append([data_stat])
j+=1
if j == 8:
j = 2
i = i + 3
if i == data.shape[0]:
k = k + 1
i = 0
if k > 7:
break
data_result = pd.DataFrame(result)
Where data.shape[0] = 150 and data.shape[1] = 8.
This code creates the correct p-values but as 1800 rows x 1 column dataframe. However, I would like to break the resulting df so that the code produces six different dataframes, each with data.shape[1]-2 number of columns (so 6 columns). With some example screenshots:
1) The data_result dataframe from my current code:
1
0.658
0.1067
0.777
0.459
0.3307
1
0.622
0.4178
0.3158
0.7674
0.7426
2) What I want:
col1 col2 col3 col4 col5 col6
1 0.658 0.1067 0.777 0.459 0.3307
1 0.622 0.4178 0.3158 0.7674 0.7426
There should be six of the above dataframes from the code.
3) I would then preferably add a column to the left of each dataframe, which would be used to insert the placeholder values for each row (screenshot omitted). This step is just optional.
So basically, I am dividing the resulting dataframe by every 6 rows, transpose them from single column to six columns, then repeat for the next six values, and so on. I thought maybe creating a Series or a new df until j = 8 then append to the overall df by row, but wasn't sure if this would work or be possible. Thanks!
edit)
so basically, I want to create six separate dataframes, each with 50 rows x 6 column shape. My current dataframe has 1800 rows x 1 column.
回答1:
For the point2: You can try it with numpy:
import numpy as np
import pandas as pd
result_array= np.asarray(result)
# reshape for 150 rows and 6 columns
result_array.reshape(150,6)
#if number of row is undefined and 6 columns
#result_array.reshape(-1,6)
return pd.DataFrame(result_array)
For point 3, I'm not sure to get it, but from the data frame return you can do everything than pandas is allowing...
回答2:
This would get you the df you need (credit should go to Renaud)
a = np.array(df)
b= a.reshape(int(df.shape[0]/6),6)
df_new = pd.DataFrame(b)
df_new.columns =['col1','col2','col3','col4','col5','col6']
df_new
Output
col1 col2 col3 col4 col5 col6
0 1.0 0.658 0.106743 0.7770 0.4590 0.3307
1 1.0 0.622 0.417800 0.3158 0.7674 0.7426
来源:https://stackoverflow.com/questions/59817359/appending-data-to-a-dataframe-but-changing-rows-after-certain-of-columns