PANDAS Create Dataframe from list of items that need parsing and matching

后端 未结 1 1488
天命终不由人
天命终不由人 2021-01-26 12:28

I have a list of item and country status.

res = [(\'63(I)[PARA.8]\',\'AFGHANISTAN Y ARGENTINA Y AUSTRALIA Y BELGIUM Y BOLIVIA Y BRAZIL N BYELORUSSIAN SSR Y CANADA         


        
相关标签:
1条回答
  • 2021-01-26 12:54

    The hard part is to parse the list of countries and codes (A, N or Y).

    • Some countries have embedded spaces (e.g., El Salvador).
    • Guatemala has no code (so I used '?')

    First, write a function to convert each tuple to a pandas Series. The 'code' is A, N or Y. Anything else is (part of) the country name.

    def raw_data_to_series(xs):
        
        name, values = xs
        
        if values == 'No Data':
            return pd.Series(dtype='object').rename(name)
    
        values = values.replace('  ', ' ').split(' ')
    
        country = ''
        results = dict()
        
        for x in values:
            if x == 'GUATEMALA':
                results[x] = '?'
                country = ''
            elif country == '':
                country = x
            elif x in {'A', 'N', 'Y'}:
                results[country] = x
                country = ''
            else:
                country = country + ' ' + x
        
        return pd.Series(results).rename(name)
    

    Now, we just pass each element of res to the function (using a list comprehension):

    pd.concat( [raw_data_to_series(r) for r in res], axis=1)
    
    
    # first 10 lines
                     63(I)[PARA.8] 63(I)[PARA.7] 63(I)[PARA.6] 99(I) 50(I)
    AFGHANISTAN                  Y             Y             Y   NaN   NaN
    ARGENTINA                    Y             Y             Y   NaN   NaN
    AUSTRALIA                    Y             Y             Y   NaN   NaN
    BELGIUM                      Y             Y             Y   NaN   NaN
    BOLIVIA                      Y             Y             Y   NaN   NaN
    BRAZIL                       N             N             N   NaN   NaN
    BYELORUSSIAN SSR             Y             Y             Y   NaN   NaN
    CANADA                       Y             Y             Y   NaN   NaN
    CHILE                        Y             Y             Y   NaN   NaN
    CHINA                        A             A             A   NaN   NaN
    
    0 讨论(0)
提交回复
热议问题