Python pandas pyhaystack

问题

I am using a module called pyhaystack to retrieve data (rest API) from a building automation system based on 'tags.' Python will return a dictionary of the data. Im trying to use pandas with an If Else statement further below that I am having trouble with. The pyhaystack is working just fine to get the data...

This connects me to the automation system: (works just fine)

from pyhaystack.client.niagara import NiagaraHaystackSession
import pandas as pd

session = NiagaraHaystackSession(uri='http://0.0.0.0', username='Z', password='z', pint=True)

This code finds my tags called 'znt', converts dictionary to Pandas, and filters for time: (works just fine for the two points)

znt = session.find_entity(filter_expr='znt').result
znt = session.his_read_frame(znt, rng= '2018-01-01,2018-02-12').result
znt = pd.DataFrame.from_dict(znt)


znt.index.names=['Date']
znt = znt.fillna(method = 'ffill').fillna(method = 'bfill').between_time('08:00','17:00')

What I am most interested in is the column name, where ultimately I want Python to return the column named based on conditions:

print(znt.columns)
print(znt.values)

Returns:

Index(['C.Drivers.NiagaraNetwork.Adams_Friendship.points.A-Section.AV1.AV1ZN~2dT', 'C.Drivers.NiagaraNetwork.points.A-Section.AV2.AV2ZN~2dT'], dtype='object')

[[ 65.9087  66.1592]
 [ 65.9079  66.1592]
 [ 65.9079  66.1742]
 ..., 
 [ 69.6563  70.0198]
 [ 69.6563  70.2873]
 [ 69.5673  70.2873]]

I am most interested in this name of the Pandas dataframe. C.Drivers.NiagaraNetwork.Adams_Friendship.points.A-Section.AV1.AV1ZN~2dT

For my two arrays, I am subtracting the value of 70 for the data in the data frames. (works just fine)

znt_sp = 70

deviation = znt - znt_sp

deviation = deviation.abs()

deviation

And this is where I am getting tripped up in Pandas. I want Python to print the name of the column if the deviation is greater than four else print this zone is Normal. Any tips would be greatly appreciated..

if (deviation > 4).any():
    print('Zone %f does not make setpoint' % deviation)

else:
    print('Zone %f is Normal' % deviation)

The columns names in Pandas are the: C.Drivers.NiagaraNetwork.Adams_Friendship.points.A-Section.AV1.AV1ZN~2dT

回答1:

Solution:
You can iterate over columns

for col in df.columns:
    if (df[col] > 4).any(): # or .all if needed 
        print('Zone %s does not make setpoint' % col)
    else:
        print('Zone %s is Normal' % col)

Or by defining a function and using apply

def _print(x):
    if (x > 4).any():
        print('Zone %s does not make setpoint' % x.name)
    else:
        print('Zone %s is Normal' % x.name)

df.apply(lambda x: _print(x))
# you can even do
[_print(df[col]) for col in df.columns]

Advice: maybe you would keep the result in another structure, change the function to return a boolean series that "is normal":

def is_normal(x):
    return not (x > 4).any()

s = df.apply(lambda x: is_normal(x))

# or directly
s = df.apply(lambda x: not (x > 4).any())

it will return a series s where index is column names of your df and values a boolean corresponding to your condition.

You can then use it to get all the Normal columns names s[s].index or the non-normal s[~s].index

Ex : I want only the normal columns of my df: df[s[s].index]

A complete example
For the example I will use a sample df with a different condition from yours (I check if no element is lower than 4 - Normal else Does not make the setpoint )

df = pd.DataFrame(dict(a=[1,2,3],b=[2,3,4],c=[3,4,5])) # A sample

print(df)
   a  b  c
0  1  2  3
1  2  3  4
2  3  4  5

Your use case: Print if normal or not - Solution

for col in df.columns:
if (df[col] < 4).any():
    print('Zone %s does not make setpoint' % col)
else:
    print('Zone %s is Normal' % col)

Result

Zone a is Normal
Zone b is does not make setpoint
Zone c is does not make setpoint

To illustrate my Advice : Keep the is_normal columns in a series

s = df.apply(lambda x: not (x < 4).any()) # Build the series
print(s)

a     True
b     False
c     False
dtype: bool


print(df[s[~s].index]) #Falsecolumns df
   b  c
0  2  3
1  3  4
2  4  5


print(df[s[s].index]) #Truecolumns df
   a
0  1
1  2
2  3

回答2:

I think DataFrame would be a good way to handle what you want. Starting with znt you can make all the calculation there :

deviation = znt - 70
deviation = deviation.abs()

# and the cool part is filtering in the df
problem_zones = 
    deviation[deviation['C.Drivers.NiagaraNetwork.Adams_Friendship.points.A-
    Section.AV1.AV1ZN~2dT']>4]

You can play with this and figure out a way to iterate through columns, like :

for each in df.columns:
    # if in this column, more than 10 occurences of deviation GT 4...
    if len(df[df[each]>4]) > 10:
        print('This zone have a lot of troubles : ', each)

edit

I like adding columns to a DataFrame instead of just building an external Series.

df[‘error_for_a’] = df[a] - 70

This open possibilities and keep everything together. One could use

df[df[‘error_for_a’]>4]

Again, all() or any() can be useful but in a real life scenario, we would probably need to trig the “fault detection” when a certain number of errors are present.

If the schedule has been set ‘occupied’ at 8hAM.... maybe the first entries won’t be correct.... (any would trig an error even if the situation gets better 30minutes later). Another scenario would be a conference room where error is tiny....but as soon as there are people in it...things go bad (all() would not see that).

来源：https://stackoverflow.com/questions/48874706/python-pandas-pyhaystack

标签

python

python-3.x

pandas

data-science