Hi Everyone,
I've been looking to Stackoverflow for couple of years, and it helped me a lot, so much that I never have to register before :)
But today I'm stuck on a problem using Python with Pandas and Quantities (could be unum or pint as well). I try to do my best to make a clear post, but since it's my first one, I apologize if something is confusing and will try to correct any mistake you'll find :)
I want to import data from a source and build a Pandas dataframe as follow:
import pandas as pd import quantities as pq depth = [0.0,1.1,2.0] * pq.m depth2 = [0,1,1.1,1.5,2] * pq.m s1 = pd.DataFrame( {'depth' : [x for x in depth]}, index = depth)
This gives:
S1= depth 0.0 0.0 m 1.1 1.1 m 2.0 2.0 m
Now I want to extend the data to the depth2 values: (obviously there is not point to interpolate depth over depth, but it's a test before it gets more complicated).
s2 = s1.reindex(depth2)
This gives:
S2= depth 0.0 0.0 m 1.0 NaN 1.1 1.1 m 1.5 NaN 2.0 2.0 m
So far no problem.
But when I try to interpolate the missing values doing:
s2['depth'].interpolate(method='values')
I got the following error:
C:\Python27\lib\site-packages\numpy\lib\function_base.pyc in interp(x, xp, fp, left, right) 1067 return compiled_interp([x], xp, fp, left, right).item() 1068 else: -> 1069 return compiled_interp(x, xp, fp, left, right) 1070 1071 TypeError: Cannot cast array data from dtype('O') to dtype('float64') according to the rule 'safe'
I understand that interpolation from numpy does not work on object.
But if I try now to interpolate the missing values by dropping the units, it works:
s3 = s2['depth'].astype(float).interpolate(method='values')
This gives:
s3 = 0.0 0 1.0 1 1.1 1.1 1.5 1.5 2.0 2 Name: depth, dtype: object
How can I get back the unit in the depth column?
I can't find any trick to put back the unit...
Any help will be greatly appreciated. Thanks
Here's a way to do what you want.
Split apart the quantities and create a set of 2 columns for each quantity
In [80]: df = concat([ col.apply(lambda x: Series([x.item(),x.dimensionality.string], index=[c,"%s_unit" % c])) for c,col in s1.iteritems() ]) In [81]: df Out[81]: depth depth_unit 0.0 0.0 m 1.1 1.1 m 2.0 2.0 m In [82]: df = df.reindex([0,1.0,1.1,1.5,2.0]) In [83]: df Out[83]: depth depth_unit 0.0 0.0 m 1.0 NaN NaN 1.1 1.1 m 1.5 NaN NaN 2.0 2.0 m
Interpolate
In [84]: df['depth'] = df['depth'].interpolate(method='values')
Propogate the units
In [85]: df['depth_unit'] = df['depth_unit'].ffill() In [86]: df Out[86]: depth depth_unit 0.0 0.0 m 1.0 1.0 m 1.1 1.1 m 1.5 1.5 m 2.0 2.0 m
Ok I found a solution, might not be the best one, but for my problem it works just fine:
import pandas as pd import quantities as pq def extendAndInterpolate(input, newIndex): """ Function to extend a panda dataframe and interpolate """ output = pd.concat([input, pd.DataFrame(index=newIndex)], axis=1) for col in output.columns: # (1) Try to retrieve the unit of the current column try: # if it succeeds, then store the unit unit = 1 * output[col][0].units except Exception, e: # if it fails, which means that the column contains string # then return 1 unit = 1 # (2) Check the type of value. if isinstance(output[col][0], basestring): # if it's a string return the string and fill the missing cell with this string value = output[col].ffill() else: # if it's a value, to be able to interpolate, you need to: # - (a) dump the unit with astype(float) # - (b) interpolate the value # - (c) add again the unit value = [x*unit for x in output[col].astype(float).interpolate(method='values')] # # (3) Returned the extended pandas table with the interpolated values output[col] = pd.Series(value, index=output.index) # Return the output dataframe return output
Then:
depth = [0.0,1.1,2.0] * pq.m depth2 = [0,1,1.1,1.5,2] * pq.m s1 = pd.DataFrame( {'depth' : [x for x in depth]}, index = depth) s2 = extendAndInterpolate(s1, depth2)
The result:
s1 depth 0.0 0.0 m 1.1 1.1 m 2.0 2.0 m s2 depth 0.0 0.0 m 1.0 1.0 m 1.1 1.1 m 1.5 1.5 m 2.0 2.0 m
Thanks for you help.