Pandas interpolate data with units

匿名 (未验证) 提交于 2019-12-03 03:10:03

问题:

Hi Everyone,

I've been looking to Stackoverflow for couple of years, and it helped me a lot, so much that I never have to register before :)

But today I'm stuck on a problem using Python with Pandas and Quantities (could be unum or pint as well). I try to do my best to make a clear post, but since it's my first one, I apologize if something is confusing and will try to correct any mistake you'll find :)


I want to import data from a source and build a Pandas dataframe as follow:

import pandas as pd import quantities as pq  depth = [0.0,1.1,2.0] * pq.m depth2 = [0,1,1.1,1.5,2] * pq.m  s1 = pd.DataFrame(         {'depth' : [x for x in depth]},         index = depth) 

This gives:

S1=      depth 0.0  0.0 m 1.1  1.1 m 2.0  2.0 m 

Now I want to extend the data to the depth2 values: (obviously there is not point to interpolate depth over depth, but it's a test before it gets more complicated).

s2 = s1.reindex(depth2) 

This gives:

S2=       depth 0.0   0.0 m 1.0   NaN 1.1   1.1 m 1.5   NaN 2.0   2.0 m 

So far no problem.


But when I try to interpolate the missing values doing:

s2['depth'].interpolate(method='values') 

I got the following error:

C:\Python27\lib\site-packages\numpy\lib\function_base.pyc in interp(x, xp, fp, left, right)    1067         return compiled_interp([x], xp, fp, left, right).item()    1068     else: -> 1069         return compiled_interp(x, xp, fp, left, right)   1070    1071  TypeError: Cannot cast array data from dtype('O') to dtype('float64') according to the rule 'safe' 

I understand that interpolation from numpy does not work on object.


But if I try now to interpolate the missing values by dropping the units, it works:

s3 = s2['depth'].astype(float).interpolate(method='values') 

This gives:

s3 =  0.0   0 1.0   1 1.1   1.1 1.5   1.5 2.0   2 Name: depth, dtype: object 

How can I get back the unit in the depth column?

I can't find any trick to put back the unit...

Any help will be greatly appreciated. Thanks

回答1:

Here's a way to do what you want.

Split apart the quantities and create a set of 2 columns for each quantity

In [80]: df = concat([ col.apply(lambda x: Series([x.item(),x.dimensionality.string],                        index=[c,"%s_unit" % c])) for c,col in s1.iteritems() ])  In [81]: df Out[81]:       depth depth_unit 0.0    0.0          m 1.1    1.1          m 2.0    2.0          m  In [82]: df = df.reindex([0,1.0,1.1,1.5,2.0])  In [83]: df Out[83]:       depth depth_unit 0.0    0.0          m 1.0    NaN        NaN 1.1    1.1          m 1.5    NaN        NaN 2.0    2.0          m 

Interpolate

In [84]: df['depth'] = df['depth'].interpolate(method='values') 

Propogate the units

In [85]: df['depth_unit'] = df['depth_unit'].ffill()  In [86]: df Out[86]:       depth depth_unit 0.0    0.0          m 1.0    1.0          m 1.1    1.1          m 1.5    1.5          m 2.0    2.0          m 


回答2:

Ok I found a solution, might not be the best one, but for my problem it works just fine:

import pandas as pd import quantities as pq  def extendAndInterpolate(input, newIndex): """ Function to extend a panda dataframe and interpolate """ output = pd.concat([input, pd.DataFrame(index=newIndex)], axis=1)  for col in output.columns:     # (1) Try to retrieve the unit of the current column     try:         # if it succeeds, then store the unit         unit = 1 * output[col][0].units         except Exception, e:         # if it fails, which means that the column contains string         # then return 1         unit = 1      # (2) Check the type of value.     if isinstance(output[col][0], basestring):         # if it's a string return the string and fill the missing cell with this string         value = output[col].ffill()     else:         # if it's a value, to be able to interpolate, you need to:         #   - (a) dump the unit with astype(float)         #   - (b) interpolate the value         #   - (c) add again the unit         value = [x*unit for x in output[col].astype(float).interpolate(method='values')]     #     # (3) Returned the extended pandas table with the interpolated values         output[col] = pd.Series(value, index=output.index) # Return the output dataframe return output 

Then:

depth = [0.0,1.1,2.0] * pq.m depth2 = [0,1,1.1,1.5,2] * pq.m  s1 = pd.DataFrame(         {'depth' : [x for x in depth]},         index = depth)  s2 = extendAndInterpolate(s1, depth2) 

The result:

s1      depth 0.0  0.0 m 1.1  1.1 m 2.0  2.0 m  s2           depth 0.0  0.0 m 1.0  1.0 m 1.1  1.1 m 1.5  1.5 m 2.0  2.0 m 

Thanks for you help.



标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!