Adding values for missing data combinations in Pandas

前端 未结 2 2021
情深已故
情深已故 2020-12-18 04:44

I\'ve got a pandas data frame containing something like the following:

person_id   status    year    count
0           \'pass\'    1980    4
0           \'fa         


        
相关标签:
2条回答
  • 2020-12-18 04:56

    create a MultiIndex by MultiIndex.from_product() and then set_index(), reindex(), reset_index().

    import pandas as pd
    import io
    
    all_person_ids = [0, 1, 2]
    all_statuses = ['pass', 'fail']
    all_years = [1980, 1981, 1982]
    df = pd.read_csv(io.BytesIO("""person_id   status    year    count
    0           pass    1980    4
    0           fail    1982    1
    1           pass    1981    2"""), delim_whitespace=True)
    names = ["person_id", "status", "year"]
    
    mind = pd.MultiIndex.from_product(
        [all_person_ids, all_statuses, all_years], names=names)
    df.set_index(names).reindex(mind, fill_value=0).reset_index()
    
    0 讨论(0)
  • 2020-12-18 05:20

    You can use itertools.product to generate all combinations, then construct a df from this, merge it with your original df along with fillna to fill missing count values with 0:

    In [77]:
    import itertools
    all_person_ids = [0, 1, 2]
    all_statuses = ['pass', 'fail']
    all_years = [1980, 1981, 1982]
    combined = [all_person_ids, all_statuses, all_years]
    df1 = pd.DataFrame(columns = ['person_id', 'status', 'year'], data=list(itertools.product(*combined)))
    df1
    
    Out[77]:
        person_id status  year
    0           0   pass  1980
    1           0   pass  1981
    2           0   pass  1982
    3           0   fail  1980
    4           0   fail  1981
    5           0   fail  1982
    6           1   pass  1980
    7           1   pass  1981
    8           1   pass  1982
    9           1   fail  1980
    10          1   fail  1981
    11          1   fail  1982
    12          2   pass  1980
    13          2   pass  1981
    14          2   pass  1982
    15          2   fail  1980
    16          2   fail  1981
    17          2   fail  1982
    
    In [82]:    
    df1 = df1.merge(df, how='left').fillna(0)
    df1
    
    Out[82]:
        person_id status  year  count
    0           0   pass  1980      4
    1           0   pass  1981      0
    2           0   pass  1982      0
    3           0   fail  1980      0
    4           0   fail  1981      0
    5           0   fail  1982      1
    6           1   pass  1980      0
    7           1   pass  1981      2
    8           1   pass  1982      0
    9           1   fail  1980      0
    10          1   fail  1981      0
    11          1   fail  1982      0
    12          2   pass  1980      0
    13          2   pass  1981      0
    14          2   pass  1982      0
    15          2   fail  1980      0
    16          2   fail  1981      0
    17          2   fail  1982      0
    
    0 讨论(0)
提交回复
热议问题