Python Pandas — Determine if Values in Column 0 Are Repeated in Each Subsequent Column

我与影子孤独终老i 提交于 2019-12-25 04:07:10

问题


I have a collection of internet-connected devices hanging around in various places. I have a dataframe that contains seven rows, one for each day in the past week. Each row contains the serial number of each device that didn't connect to my server that day. I am trying to compile a report that creates an 8th row, which contains the serial number of each device that failed to communicate for seven straight days. Here is a simplified mock-up of my dataframe:

2016-10-01, AAAA, BBBB, CCCC, EEEE
2016-10-02, AAAA, BBBB, EEEE,
2016-10-03, AAAA, BBBB, CCCC, EEEE
2016-10-04, AAAA, BBBB, CCCC, EEEE
2016-10-05, BBBB, CCCC, DDDD, EEEE
2016-10-06, AAAA, BBBB, CCCC, EEEE
2016-10-07, AAAA, BBBB, CCCC, FFFF

Here is the code block that is giving me problems. I am trying to compare the values in the first column with the values of each of the other columns. If I get 6 Trues, I add the serial number to a new list then try to tack that on to the dataframe.

cursor = localConnection.cursor()
cursor.execute(allInstalledQuery % ('', fac[0]))
cursor.fetchall()
totalDevices = cursor.rowcount
cursor.close()
for i in range(7, 0, -1):
    loopCursor = localConnection.cursor()
    print(i)
    sns = []
    d = datetime.datetime.strptime(todayDate, "%Y-%m-%d") + datetime.timedelta(days=-i)
    sns.append(d.strftime("%Y-%m-%d"))
    if i != 1:
        loopCursor.execute(missingReportQuery % (i, 1, i, 1, '', fac[0]))
    else:
        loopCursor.execute(missingReportQuery % (i, 0, i, 0, '', fac[0]))
        rows = loopCursor.fetchall()
        numMissing = loopCursor.rowcount
        missingSummary = "%d / %d devices missing"
        sns.append(staleSummary % (nummissing, totaldevices))
        for row in rows:
            sns.append(row[4])
    masterList.append(sns)
    loopCursor.close()

df = pandas.DataFrame(masterList)

EDIT The following code block is depricated:

firstDayData = df.iloc[[0], :].values
missingSevenDays = []

for s in firstDayData:
    print(s)
    a = pandas.Series(df[1])
    b = pandas.Series(df[2])
    c = pandas.Series(df[3])
    d = pandas.Series(df[4])
    e = pandas.Series(df[5])
    sn = list(str(s))
    if a.isin(sn) is True and b.isin(sn) is True and c.isin(sn) is True and d.isin(sn) is True \
            and e.isin(sn) is True:
        missingSevenDays.append(sn)
df.append(missingSevenDays)

AND has been replaced by this:

counts = df.stack().value_counts()
seven_day = counts[counts == 7]
filtered_df = df[seven_day.index]
missingSevenDays = []
for neat in filtered_df.values:
    print(neat)

I want this to print out the serial number of all devices that have been missing for 7 days. As it stands, it is just printing out []. I'm afraid I'm completely bewildered as to how to use these data structures.

来源:https://stackoverflow.com/questions/40244094/python-pandas-determine-if-values-in-column-0-are-repeated-in-each-subsequent

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!