Understanding bracket filter syntax in pandas

ぃ、小莉子 提交于 2021-02-11 14:44:11

问题


How does the following filter out the results in pandas ? For example, with this statement:

df[['name', 'id', 'group']][df.id.notnull()]

I get 426 rows (it filters out everything where df.group IS NOT NULL). However, if I just use that syntax by itself, it returns a bool for each row, {index: bool}:

[df.group.notnull()]

How does the bracket notation work with pandas ? Another example would be:

df.id[df.id==458514]            # filters out rows
# vs 
[df.id==458514]                 # returns a bool

回答1:


Not a full answer, just a breakdown of df.id[df.id==458514]

  • df.id returns a series with the contents of column id
  • df.id[...] slices that series with either 1) a boolean mask, 2) a single index label or a list of them, 3) a slice of labels in the form start:end:step. If it receives a boolean mask then it must be of the same shape as the series being sliced. If it receives index label(s) then it will return those specific rows. Sliciing works just as with python lists, but start and end be integer locations or index labels (e.g. ['a':'e'] will return all rows in between, including 'e').
  • df.id[df.id==458514] returns a filtered series with your boolean mask, i.e. only the items where df.id equals 458514. It also works with other boolean masks as in df.id[df.name == 'Carl'] or df.id[df.name.isin(['Tom', 'Jerry'])].

Read more in panda's intro to data structures



来源:https://stackoverflow.com/questions/63713092/understanding-bracket-filter-syntax-in-pandas

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!