问题
Here is a first 10 columns of my dataframe:
import pandas as pd
df = pd.DataFrame({
'0': [373.60],
'1': [442.83],
'2': [259.21],
'3': [293.05],
'4': [332.79],
'5': [360.03],
'6': [676.55],
'7': [481.67],
'8': [486.59],
'9': [561.65],
'10': [491.75]})
And so on, actually my df contains 100000 columns. Min is a 109.59, and max is a 1703.35.
I want to slice df into specific ranges with length of 3.98, and then define a ragne that contain a maximum amount of values. I mean, the ranges must be like:
# converting df to array
df_array = np.array(df)
# defining ranges like:
range_length=3.98
range_1 = df_array.min() + range_length
range_2 = range_1 + range_lenght
...
range_n = df_array.max() - range_n-1
And then I see that some range_150 contains about 1200 values, which is a most frequent distribution range that I need.
And thet I need to define index of each value from that range in my df..
Really haven't any ideas how to do that. Looks like need create several functions. Can somebody help please?
回答1:
Like this you get the number of entries for each range:
ranges = np.arange(df.T.min()[0] - 5, df.T.max()[0] + 5, 3.98) #added +5 to max and -5 to min to surely include them in the range
df_count = df.T.groupby(pd.cut(df.T[0], ranges)).count()
df_count
0
0
(254.21, 258.19] 0
(258.19, 262.17] 1
(262.17, 266.15] 0
(266.15, 270.13] 0
(270.13, 274.11] 0
..
(660.17, 664.15] 0
(664.15, 668.13] 0
(668.13, 672.11] 0
(672.11, 676.09] 0
(676.09, 680.07] 1
[107 rows x 1 columns]
Like this you can get the index (the range) with most hits:
df_count.idxmax()
0 (258.19, 262.17]
dtype: object
You can get the entries which are in this range like this:
df.T[df.T[0].between(258.19, 262.17)]
0
2 259.21
Maybe it helps.
来源:https://stackoverflow.com/questions/59825370/how-to-extract-ranges-with-specific-length-from-dataframe-row-in-python