pandas

Python pandas: how to obtain the datatypes of objects in a mixed-datatype column?

两盒软妹~` 提交于 2021-02-20 09:29:07
问题 Given a pandas.DataFrame with a column holding mixed datatypes, like e.g. df = pd.DataFrame({'mixed': [pd.Timestamp('2020-10-04'), 999, 'a string']}) I was wondering how to obtain the datatypes of the individual objects in the column (Series)? Suppose I want to modify all entries in the Series that are of a certain type, like multiply all integers by some factor. I could iteratively derive a mask and use it in loc , like m = np.array([isinstance(v, int) for v in df['mixed']]) df.loc[m, 'mixed

pandas rolling() function with monthly offset

一个人想着一个人 提交于 2021-02-20 09:27:25
问题 I'm trying to use the rolling() function on a pandas data frame with monthly data. However, I dropped some NaN values, so now there are some gaps in my time series. Therefore, the basic window parameter gives a misleading answer since it just looks at the previous observation: import pandas as pd import numpy as np import random dft = pd.DataFrame(np.random.randint(0,10,size=len(dt)),index=dt) dft.columns = ['value'] dft['value'] = np.where(dft['value'] < 3,np.nan,dft['value']) dft = dft

efficient way to find several rows above and below a subset of data

送分小仙女□ 提交于 2021-02-20 09:04:08
问题 I'm wondering if there's an efficient way to get X number of rows below and above a subset of rows. I've created a basic implementation below, but I'm sure there's a better way. The subset that I care about is buyindex, which is the indices of rows that have the buy signal. I want to get several rows above and below the sellindex to verify that my algorithm is working correctly. How do I do it in an efficient way? My way seems roundabout. buyindex = list(data2[data2['buy'] == True].index)

Pandas. How to read Excel file from ZIP archive

只谈情不闲聊 提交于 2021-02-20 06:18:16
问题 I have .zip archive with filename.xlsx inside it and I want to parse Excel sheet line by line. How to proper pass filename into pandas.read_excel in this case? I tried: import zipfile import pandas myzip=zipfile.ZipFile(filename.zip) for fname in myzip.namelist(): with myzip.open(fname) as from_archive: with pandas.read_excel(from_archive) as fin: for line in fin: .... but it doesn't seem to work, and the result was: AttributeError: __exit__ 回答1: You can extract your zip-file into a variable

plot dataframe with two y-axes

ⅰ亾dé卋堺 提交于 2021-02-20 06:16:57
问题 I have the following dataframe: land_cover 1 2 3 4 5 6 size 0 20 19.558872 6.856950 3.882243 1.743048 1.361306 1.026382 16.520265 1 30 9.499454 3.513521 1.849498 0.836386 0.659660 0.442690 8.652517 2 40 10.173790 3.123167 1.677257 0.860317 0.762718 0.560290 11.925280 3 50 10.098777 1.564575 1.280729 0.894287 0.884028 0.887448 12.647710 4 60 6.166109 1.588687 0.667839 0.230659 0.143044 0.070628 2.160922 5 110 17.846565 3.884678 2.202129 1.040551 0.843709 0.673298 30.406541 I want to plot the

plot dataframe with two y-axes

懵懂的女人 提交于 2021-02-20 06:16:18
问题 I have the following dataframe: land_cover 1 2 3 4 5 6 size 0 20 19.558872 6.856950 3.882243 1.743048 1.361306 1.026382 16.520265 1 30 9.499454 3.513521 1.849498 0.836386 0.659660 0.442690 8.652517 2 40 10.173790 3.123167 1.677257 0.860317 0.762718 0.560290 11.925280 3 50 10.098777 1.564575 1.280729 0.894287 0.884028 0.887448 12.647710 4 60 6.166109 1.588687 0.667839 0.230659 0.143044 0.070628 2.160922 5 110 17.846565 3.884678 2.202129 1.040551 0.843709 0.673298 30.406541 I want to plot the

Pandas. How to read Excel file from ZIP archive

…衆ロ難τιáo~ 提交于 2021-02-20 06:13:27
问题 I have .zip archive with filename.xlsx inside it and I want to parse Excel sheet line by line. How to proper pass filename into pandas.read_excel in this case? I tried: import zipfile import pandas myzip=zipfile.ZipFile(filename.zip) for fname in myzip.namelist(): with myzip.open(fname) as from_archive: with pandas.read_excel(from_archive) as fin: for line in fin: .... but it doesn't seem to work, and the result was: AttributeError: __exit__ 回答1: You can extract your zip-file into a variable

Pandas count values inside dataframe

≡放荡痞女 提交于 2021-02-20 05:00:32
问题 I have a dataframe that looks like this: A B C 1 1 8 3 2 5 4 3 3 5 8 1 and I want to count the values so to make df like this: total 1 2 3 2 4 1 5 2 8 2 is it possible with pandas? 回答1: With np.unique - In [332]: df Out[332]: A B C 1 1 8 3 2 5 4 3 3 5 8 1 In [333]: ids, c = np.unique(df.values.ravel(), return_counts=1) In [334]: pd.DataFrame({'total':c}, index=ids) Out[334]: total 1 2 3 2 4 1 5 2 8 2 With pandas-series - In [357]: pd.Series(np.ravel(df)).value_counts().sort_index() Out[357]:

Count of a value in consecutive timestamp in pandas

和自甴很熟 提交于 2021-02-20 04:45:28
问题 Hour Site 01/08/2020 00:00 A 01/08/2020 00:00 B 01/08/2020 00:00 C 01/08/2020 00:00 D 01/08/2020 01:00 A 01/08/2020 01:00 B 01/08/2020 01:00 E 01/08/2020 01:00 F 01/08/2020 02:00 A 01/08/2020 02:00 E 01/08/2020 03:00 C 01/08/2020 03:00 G ….. 01/08/2020 04:00 x 01/08/2020 04:00 s ….. 01/08/2020 23:00 G 02/08/2020 00:00 G I have a dataframe like above. I want to count how many times a site comes in consecutive hours & start and end timestamp. wheres in each hour there are multiple sites. For

Count of a value in consecutive timestamp in pandas

笑着哭i 提交于 2021-02-20 04:45:08
问题 Hour Site 01/08/2020 00:00 A 01/08/2020 00:00 B 01/08/2020 00:00 C 01/08/2020 00:00 D 01/08/2020 01:00 A 01/08/2020 01:00 B 01/08/2020 01:00 E 01/08/2020 01:00 F 01/08/2020 02:00 A 01/08/2020 02:00 E 01/08/2020 03:00 C 01/08/2020 03:00 G ….. 01/08/2020 04:00 x 01/08/2020 04:00 s ….. 01/08/2020 23:00 G 02/08/2020 00:00 G I have a dataframe like above. I want to count how many times a site comes in consecutive hours & start and end timestamp. wheres in each hour there are multiple sites. For